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PREFACE 


RAND WAS COMMISSIONED by the Office of the Assistant Secretary of De- 
fense (Systems Analysis) to prepare a book on the subject of military 
equipment cost-estimating procedures. This memorandum deals with funda- 
mentals of cost analysis and constitutes the introductory portion of 
such a book. In addition to the material presented here, the complete 
book will deal with uncertainty, methods and techniques for estimating 
costs of military equipment such as aircraft, and cost models. Emphasis 
is placed on cost-estimating techniques that are applicable across a 
broad spectrum of major military equipment. Consequently, it is hoped 
that this memorandum, which represents a selection of the more general 
areas covered in the hook, will be useful throughout the Department of 


Defense and the aerospace industry. 
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SUMMARY 


THIS MEMORANDUM is a compilation of topics related to equipment cost 
estimating. These topics are treated in five separate sections: (1) 
cost-estimating methods, (2) data collection and adjustment, (3) sta- 
tistical methods in development of estimating relationships, (4) use 
of cost-estimating relationships, and (5) the learning curve. 

There are three basic methods used for cost estimation--the indus- 
trial engineering, analogy, and statistical approaches. The industrial 
engineering approach respresents an examination of separate segments of 
work at a low level of detail and a synthesis of the many detailed es- 
timates into a total. The method of analogy is based on direct com- 
parisons with historical information on like components of existing 
systems. In the statistical approach, as defined in this memorandum, 
estimating relationships with parametric explanatory variables, such 
as weight, speed, power, frequency, and thrust, are used to predict 
cost. This is usually applied at a higher level of detail than the 
industrial engineering approach. 

Of the three approaches to cost estimating, statistical methods 
are considered to be the most useful for government analysts in a wide 
range of application, whether the purpose is long-range planning or 
contract negotiation. Any estimating method, however, is basically a 
projection from past experience, and to make this projection it is nec- 
essary to have a reliable data base. This must include information on 
the cost, physical and performance characteristics, and on the develop- 
ment and production history of previous hardware programs. In addition, 
because the data must be comparable to be useful, adjustments must be 
made for definitional differences, production quantity differences, 


and yearly price changes. 
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In the discussion on statistical methods, a hypothetical example 
is used to demonstrate the procedures and techniques of this method. 
First, attention is given to a simple linear regression, with a single 
explanatory variable. Next, a logarithmic transformation of this re- 
lationship is treated. Finally, multiple regressions are performed in 
various pairwise combinations of three explanatory variables. These 
multiple regressions are performed for both linear and nonlinear (log- 
arithmic) relationships. 

The limitations of estimating relationships stem from two sources: 
first, the uncertainty inherent in any application of statistics; and, 
second, the uncertainty that an estimating relationship is applicable 
to a particular situation. Important considerations that can be easily 
overlooked during a purely formal statistical analysis include (1) the 
reasonableness and structural soundness of the estimating relationship, 
(2) the importance of the analyst's familiarity with the actual hard- 
ware, and (3) systematic bias by the analyst. Although the value of 
statistical estimating relationships should not be discounted (their 
widespread use and general applicability attest to their worth), cau- 
tion is recommended in applying these relationships outside the data 
base from which they were derived. 

The last section covers the subject of learning curves, which are 
used to predict reductions in cost as the number of items produced in- 
creases. The learning process prevails in many industries, and its 
existence has been verified by empirical data. The factors that account 
for this learning trend are generally attributed to such items as job 
familiarization, development of more efficient tools, and improvement 
in overall management. The basis of learning-curve theory is that each 
time the total quantity of items produced doubles, the cost per item 
is reduced to a constant percentage of its previous cost. Such a rela- 
tionship (log-linear) may be expressed in terms of unit cost or cumula- 
tive average cost. In practice, the unit cost is most frequently con- 
sidered to be linear, but there are sufficient exceptions to suggest 
that the choice must be based on experience. 

When learning curves are displayed graphically, the problem arises 


of how to plot the average cost for a lot or a complete contract, since, 
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typically, man-hours or costs are not recorded by each unit. For the 
cumulative average curve, the plot point is simply the endpoint of 

each lot, since this is the point where the cumulative average figure 

is applicable. For the unit curve, calculating the plot point is more 
complex, and approximations are widely used. The plotting of representa- 
tive unit costs for contract lots is of importance, especially the early 
points whose misplacements could lead to improper conclusions about the 
cost-quantity relationship. 

In the application of learning curves to problems associated with 
cost estimating, the analyst must be cognizant of the wide variations 
possible and the reasons for such variations. A thorough knowledge of 
the learning-curve phenomenon is indispensable to persons involved in 


cost analysis. 
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I. COST-ESTIMATING METHODS 


A COST ESTIMATE is a judgment or opinion regarding the cost of an ob- 
ject, commodity, or service. This judgment or opinion may be arrived 
at formally or informally by a variety of methods, all of which are 
based on the assumption that experience is a reliable guide to the 
future. In some cases the guidance is clear and unequivocal; e.g., 
bananas cost 15¢ per pound last week; it is estimated that they will 
cost about 15¢ per pound next week, barring unforeseen circumstances 
such as a freeze in Guatemala. At a more sophisticated level, aver- 
age costs are calculated and used as factors to estimate the cost to 
excavate a cubic yard of earth, to fly an airplane for an hour, or to 
drive an automobile a mile. Much, perhaps most, estimating is of this 
general type, i.e., where the relationship between past experience and 
future application is fairly direct and obvious. 

The more interesting problems, however, are those in which the 
relationship is unclear, because the proposed item differs in some 
significant way from its predecessors. The challenge to cost analysts 
concerned with military hardware is to project from the known to the 
unknown, to use experience on existing equipment to predict the cost 
of next-generation missiles, aircraft, and space vehicles. The chal- 
lenge is not only in new equipment designs; new materials, new produc- 
tion processes, and new contracting procedures also add to uncertainty. 
These innovations are sometimes accompanied by expectations of cost in- 
creases or of cost reductions that must be carefully evaluated. 

The techniques used for estimating hardware cost range from intui- 


tion at one extreme to a detailed application of labor and material 
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cost standards at the other. One of the military services’ manuals on 
cost estimating lists five basic methods--industrial engineering stan- 
dards; rates, factors, and catalog prices; estimating relationships; 
specific analogies; and expert opinion. Other sources put the number 
at two (synthesis and analysis), three (round-table estimating, esti- 
mating by comparison, and detailed estimating), or four (analytical 
appraisal, comparative analysis, statistical analysis, and use of stan- 
dards). In this section, the discussion will be limited to three tech- 
niques--the industrial engineering approach, analogy, and the statisti- 
cal approach--and it is the latter that will be of primary concern 
throughout the remainder of the memorandum. 

Estimating by industrial engineering procedures can be broadly 
defined as an examination of separate segments of work at a low level 
of detail and a synthesis of the many detailed estimates into a total. 
Statistical estimating is sometimes defined as a statistical extrapo- 
lation to produce an estimate-at-completion after progress has been 
made on a job and costs or commitments have been experienced, but this 
is not the sense in which the term is used in this study. In the sta- 
tistical approach, estimating relationships that use explanatory vari- 
ables such as weight, speed, power, frequency, and thrust are relied 
on to predict cost at a higher level of aggregation. Figure 1 illus- 
trates this difference in level of detail. At the lowest level of de- 
tail, the estimator begins with a set of drawings and specifies each 
engineering task, tool requirement, or production operation, including 
the labor and material required. This is sometimes referred to as 
"grass-roots" estimating. 

Table 1 illustrates the detail required at the lowest level of 
estimating; in this case a labor cost estimate for forming a steel 
center bracket. The name and number of the operations and the machines 
that will be used are given with estimates of setup and operating time 
and labor cost. When they exist, standard setup and operating costs 
are used in making estimates, but if standards have not been estab- 
lished (which is frequently the case in the aerospace industry), a 
detailed study is made to determine the most efficient method of per- 


forming each operation. A standard may be a "pure" standard or an 
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STATISTICAL 
PROCEDURE 


Engineering direct 
labor hours 
Engineering 

materials 
Engineering 
direct charges 
Tooling direct 

labor hours 

Tooling materials 

and purchased tools 
Tooling 
direct charges 
Quality control 
direct labor hours 
Quality control 
direct charges 
Manufacturing 
direct labor hours 
Manufacturing 


materials and 
purchased parts 


Manufacturing 
direct charges 
Purchased 
equipment 












INDUSTRIAL ENGINEERING 
PROCEDURE 


Number of engineers 
by department and task 


Type and quantity of 
materials and test equipment 


Type of direct charge: 
computer rental, 


reproduction services, 
travel and per diem 





Type and quantity of 


specific tools required 





Type of direct charge: 
equipment rental, 
blueprint services 











Work center and station 
requirements or 
percentage of direct 
labor hours 






Type of direct charge: 
reproduction services, 









Tasks by manufacturing 
processes: fabrication 
subassembly, final 
assembly, and checkout 







Parts list, specific type 
and quantity of raw 
materials, scrap and rejects 











Type of direct charge: 
reproduction services, 
travel and per diem 













Parts list items: landing 
gear, environmental control, 
secondary power, instruments 





Fig. 1--Levels of aggregation for estimating purposes 
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"attainable" standard, but for a specified condition, it is essentially 
the minimum time required to complete a given operation and theoreti- 
cally should be approached asymptotically when the planned production 
rate is attained. 

Standards are not widely used in the aerospace industry for esti- 
mating costs, although they are used extensively for other purposes, 
such as control of shop performance. Standards are best applied when 
a long, stable production run of identical items is envisaged; in the 
aerospace industry, however, emphasis is often placed on development 
rather than on production. The Gemini program provides an extreme 
example: Twelve spacecraft of varying configurations were developed 
and produced at a cost of $700 million. Other examples would be less 
dramatic, but it is true that compared with industry in general, pro- 
duction runs of advanced military and space hardware tend to be short, 
and both design configurations and production processes may continue 
to evolve even after several hundred units have been completed. This 
means that standards are continually changing--one standard applies 
at unit 50, another at other production quantities. Because changes 
are unpredictable, it is difficult to establish standards that will 
be applicable at some specified production quantity in advance of 
production experience. 

Industrial engineering estimating procedures require consider- 
ably more personnel and data than are likely to be available to gov- 
ernment agencies under any foreseeable conditions. One of the largest 
aerospace firms judges that the use of this approach in estimating the 
cost of an airframe requires about 4500 estimates; for this reason, 
the firm avoids making industrial engineering estimates whenever pos- 
sible. They take too much time and are costly to both contractor and 
government during a period of limited funds. Moreover, for many pur- 
poses they have been found to be less accurate than estimates made 
statistically. One reason is simply that the whole often turns out 
to be greater than the sum of 4500 parts. The detail estimator works 
under the same disadvantages as do all other estimators before an item 
has been produced. He works from sketches, blueprints, or word de- 


scriptions of some item that has not been completely designed, and he 
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can assign costs only to work that he knows about. (An attempt is some- 
times made to estimate the completeness of the work statement and this 
estimate becomes a factor to apply to the detail estimate; e.g., if the 
work statement is judged to be 50 percent complete, the detail estimate 
is multiplied by two.) The effect of a low estimate is compounded be- 
cause detail estimating is normally attempted only on a portion of 
production labor hours. A number of production labor elements, such 

as rework, planning time, and coordination effort, are usually factored 
in as percentages of the detail estimate. Then, other cost elements, 
such as sustaining effort, tool maintenance, quality control, and manu- 
facturing research, are factored in as percentages of production labor. 
Thus, small errors in the detail estimate can result in large errors 

in the total. 

A second reason for considering industrial engineering standards 
less accurate than estimates made statistically has already been sug- 
gested. Significant variability in the fabrication and assembly of 
successive production units is, and will continue to be, characteris-— 
tic of the industry. Production runs of like models tend to be of lim- 
ited length and are characterized by numerous design changes. In the 
case of military aircraft, production rates have tended to vary fre- 
quently and at times unexpectedly. The proportion of new components 
in equipment is probably higher in the aerospace industry than in any 
other. The effect of these factors can be represented statistically 
by the learning or progress curve so characteristic of this industry. 
One set of fabrication and assembly modes is succeeded by more effi- 
cient production functions, which lower the total labor requirement. 
The introduction of engineering changes causes discontinuities in this 
process but does not interfere with the general trend. If new manu- 
facturing processes and techniques are introduced, these may cause 
changes in past relationships. History, however, seems to show that 
changes in manufacturing and management techniques, although they may 
have dramatic impacts in circumscribed areas, tend to result in only 
gradual changes over the entire process. 

Because a private concern generally has information only on its 


own products, much of the estimating in industry is based on analogy, 
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particularly when a firm is venturing into a new area. For example, 
in the 1950s, aircraft companies bidding on ballistic missile programs 
drew analogies between aircraft and missiles to develop estimates for 
the latter. Douglas Aircraft Company (now McDonnell-Douglas) made a 
good estimate on the Thor intermediate range ballistic missile by com- 
paring Thor with the DC-4 transport airplane. This company later based 
its estimates of the Saturn S-IV stage on its Thor experience. Even 
with appropriate adjustments for differences in size, the number of 
engines, higher performance, and insulation problems (the need to cope 
with liquid hydrogen as well as liquid oxygen), this attempt was not 
as successful as the first. 

At all levels of aggregation, much estimating is performed by 
this type of analogy: System A required 100,000 hours; given the 
likenesses and differences in design and in performance of proposed 
System B, the requirement for B is estimated at, say, 120,000 hours. 
Or, at a different level, engineers and shop foremen may rely on anal- 
ogies when making a grass-roots estimate; in this event, analogy be- 
comes part of the industrial engineering approach. The major drawback 
to estimating by analogy is that it is essentially a judgment process 
and, as a consequence, requires considerable experience and expertise 
to be done successfully. For the government cost analyst, analogy can 
be useful for a rough check of an estimate; however, when making esti- 
mates, analogy based on a sample of 1 adjusted by some complexity fac- 
tor should be avoided. This caveat rests on the contention that first, 
it is poor statistics; second, it is nonreproducible; and third, it 
cannot be evaluated by the user of the estimate. 

Although statistical procedures are preferable in most situations, 
there are circumstances when analogy or industrial engineering tech- 
niques are required because the data do not provide a systematic his- 
torical basis for estimating cost behavior. It may be that a new item 
is to be constructed of some unfamiliar material, or that a design 
consideration is so radically different that statistical procedures 
are inadequate. The use of new structural material for aircraft often 
requires the development of special cutting and forming techniques 


with manufacturing labor requirements that differ significantly from 


8 EQUIPMENT COST ESTIMATING 


those based on a sample of primarily aluminum airframes. Faced with 
this problem when titanium was first considered for use in airframe 
manufacture, airframe companies developed standard-hour values for ti- 
tanium fabrication on the basis of shop experience in fabricating test 
parts and sections. Ratios of these values to those for comparable 
operations on aluminum aircraft were prepared, and these ratios were 
then used in existing statistical estimating relationships. Thus, 

while industrial engineering procedures were used to provide input data, 
the approach remained statistical. 

A similar situation occurs in the case of industrial facilities. 
Requirements for these cannot be estimated without knowing the contrac- 
tor's identity and the extent and availability of his existing plant. 
Consequently, the cost of facilities must be estimated from information 
available for each specific case. 

There will always be situations in which analogy or industrial en- 
gineering techniques are required, but in general the statistical ap- 
proach is useful in a wide range of contexts, whether the purpose is 
long-range planning or contract negotiation. In the former, a more 
highly aggregated procedure may be used because it ensures comparabil- 
ity when little detailed knowledge about the equipment is available. 
Total hardware cost may be estimated as a function of one or more ex- 
planatory variables; e.g., engine cost as a function of thrust, or 
transmitter cost as a function of power output and frequency. However, 
this approach is often a matter of necessity, not choice. Even for 
long-range planning, it is sometimes desirable to estimate in some 
detail. 

To say that statistical techniques can be used in a variety of 
situations does not imply that the techniques are the same for all sit- 
uations. They will vary according to the purpose of the study and the 
information available. In a conceptual study, it is necessary to have 
a procedure for estimating the total expected costs of a program, and 
this must include an allowance for the contingencies and unforeseen 
changes that seem to be an inherent part of most development and pro- 
duction programs. 


Similarly, a long-range planning study will use industry-wide 
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labor and burden rates and an estimated learning-curve slope; later in 
the acquisition cycle, data that are specific for a particular contrac- 
tor in a particular location can be used. In effect, this procedure 
merely asserts the obvious: As more is known, fewer assumptions are 
required. When enough is known, and this means when a product is well 
into production, accounting information and data can be taken directly 
from records of account and used with a minimum of statistical manip- 
ulation. This technique is useful only in those cases when the future 
product or activity under consideration is essentially the same (both 
in terms of configuration and scale of production or operation) as 
that for the past or current period. 

In any situation the estimating procedure to be used should be 
determined by the data available, the purpose of the estimate, and, 
to an extent, by such other factors as the time available to make an 
estimate. The essential idea to be conveyed in this section is that, 
when properly applied, statistical procedures are varied and flexible 
enough to be useful in most situations that aerospace equipment cost 
analysts are likely to encounter. Although no specified set of pro- 
cedures can guarantee accuracy, decisions must be made; it is essen- 
tial that they be based on the best possible information. The analyst 
‘must seek the approaches that will provide the best possible answers, 
given the basic information that is available. 

Although the content of this memorandum is limited to methods of 
estimating equipment cost, any decision to undertake a new program 
typically takes into consideration far more than the outlays needed 
to develop and produce the equipment. For example, there may be a 
need for complementary hardware, such as launchers or test equipment; 
possibly additional construction will be needed, such as lengthened 
runways or hardened shelters. Other investment items may include the 
cost of personnel training, computer programming services, and develop- 
ment of technical data. However, a number of items that contribute to 
system operating cost (particularly spares) are usually estimated as 
a function of total equipment cost. 

In addition to the initial investment that is needed to estab- 


lish a new capability, there are costs of operating and maintaining 
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equipment that continue as long as it is in the active inventory. These 


recurring costs include 


e Replacement of common (or organizational) equipment. 
@ Replenishment of spare parts and supplies. 


e Fuels, lubricants, and propellants. 


e Training ordnance and other expendables. 

e Personnel costs. 

e Facilities maintenance. 

e Training of replacements. 

e Maintenance and other logistics support by separate 


organizations. 


These operating costs are far more important in the lifetime total 

cost computation than their annual figure might suggest. In fact, 
since the life of a modern weapon system may run ten years (or longer), 
the investment needed to establish a new system may be dwarfed by the 
costs required to operate and to maintain it. The practical conse- 
quence of this observation is that when the overall study is con- 
strained by time and personnel limitations, as is often the case, the 
estimation of equipment costs can be accorded only a reasonable share 


of the time and personnel available for the whole study. 


II. DATA COLLECTION AND ADJUSTMENT 


THE GOVERNMENT has been collecting cost and program data on weapon and 
support systems for many years--sometimes in detail, sometimes in highly 
aggregated form. Consequently, it is surprising that the right data 
seldom seem to be available when an estimating job is required. It ap- 
pears that the needs of the cost analyst have not always been consid- 
ered in designing the many information systems that have been used by 
the Army, Navy, and Air Force. Data have been collected for program 
control, for program management, and for program audit, but this infor- 
mation has never been systematically processed and stored. Instead, 
after a few years it has generally been discarded or placed in not read- 
ily accessible warehouses. Moreover, the data were often inconsistent 
since they were gathered according to the requirements of each military 
service and each program manager. To obtain the data to develop esti- 


mating relationships, the analyst has had to use contractor records. 


Data Collection 


The Cost Information Report (CIR) was established in 1966 to alle- 
viate the problem of data collection. This reporting system was de- 
signed to collect costs and related data on major contracts for air- 
craft and missile and space programs to assist industry and government 
in estimating and analyzing the costs of these programs. Information 
from other sources (contract records, management records, and the like) 


can be processed to complement the CIR and thus make complete program 


am 
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histories available. (Subsequent sections of this study describe the 
methods of analysis that this information was designed to serve.) As 
data accumulate over a period of years, the need for ad hoc collection 
efforts should diminish. These efforts will never disappear completely, 
however, as information systems cannot be designed to satisfy every 
data requirement. Under ideal conditions, the analyst would have data 
with which to develop estimating techniques responsive to any demand, 
but even the largest contractors are reluctant to allocate the resources 
required to put estimators in such a favorable position, and the cost 

to the Department of Defense (DOD) for such data--much of which would 
seldom be used--would be prohibitive. However, a government analyst or 
estimator has one great advantage over his counterpart in industry: 

He has a much broader data base to draw on. 

A minimum data requirement exists for any given job, but before 
data collection begins the analyst must consider the scope of his pro- 
blem, define generally what he wants to do, and decide how to do it. 
The data required to estimate equipment costs for a long-range plan- 
ning study can be substantially less than those needed to prepare an 
independent cost estimate for contract negotiation. In the former, 
total equipment costs may suffice; in the latter, costs must be col- 
lected at the level of detail in which the contract is to be negotia- 
ted. For major items, this means a functional breakout, e.g., direct 
labor, materials, engineering, and tooling. One could postulate pro- 
blems requiring even a greater amount of detail. Suppose, for example, 
that two similar hardware items had substantially different costs. 

Only by examining the cost detail could this difference be explained. 

In performing this initial appraisal of the job, the analyst will 
be aided by a thorough knowledge of the kind of equipment with which 
he will be dealing--its characteristics, the state of its technology, 
and the available sample. With this knowledge he can determine the 
kinds of data that are required and that are available for what he 
wants to do, where the data are located, and the kinds of adjustments 
that may be required to make the collected data base consistent and 
comparable. Only after the problem has been given this general con- 


sideration should the task of data collection begin. All too often 
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large amounts of data are collected with little thought about use. 

The result is that some portion may be unnecessary, unusable, or not 
completely understood. Data collection is generally the most trouble- 
some and time-consuming part of cost analysis. Consequently, careful 


planning in this phase of the overall effort is well worthwhile. 


Historical Data 


To develop a cost-estimating procedure, at least three different 
types of historical data are required. First, there are the resource 
data, usually in the form of expenditures and labor hours. It is cus- 
tomary to apply the word cost to both, and that practice is followed 
throughout this text. A second type of data describes the possible 
cost-explanatory elements; for hardware such as aircraft and missiles 
this means performance and physical characteristics. The third type 
is program data, i.e., information related to the development and pro- 


duction history of past hardware programs. 


Resource Data 


Resource data are generally classified under end-item categories 
or functional categories. An example of the former in various possible 
levels of detail are system, subsystem, component, and part. The func- 
tional cost categories, such as engineering, tooling, manufacturing, 
quality control, purchased equipment, are usually broken down into cost 
elements--labor, material, overhead, and other direct charges. The 
data source is the contractor's plant. Generally, the accounting sys- 
tems will vary from one company to another, and the amount of detail is 
immense. A typical airframe company, for example, sets up the produc- 
tion process on the basis of a number of different jobs or stations, 
each identified by a number or symbol. All manufacturing direct labor 
and material (depending on the type of cost-accounting system) expended 
on a given job is recorded on a job order or, as is becoming increas- 
ingly more common, fed directly into a computer. When such a system 
is used, the actual hours incurred for every operation are available 


to management; and these costs can be aggregated as they are needed. 
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Manufacturing costs of this type can be attributed to a lot or often to 
a single unit. (Some categories of cost are not identifiable by lot or 
unit, e.g., tooling and engineering.) But since contractors organize 
their work differently, different job orders will be used. This means 
that data at more detailed levels may vary from contractor to contractor 
and may not be comparable. Also, detailed information of this kind. is 
unnecessary for most government analysis and should rarely be sought. 
If there were a need to estimate in more detail, the data required 
would increase by at least an order of magnitude, and data processing 
equipment would become a necessity. When to incorporate automatic data 
processing techniques into the data collection effort is determined 
primarily by the volume of data to be handled. The trend in the aero- 
space industry is to rely more and more on computers for internal data 
needs, and for some purposes data have been provided to the government 
on punched cards or magnetic tape. Thus, there are no technical rea- 
sons why cost data could not be obtained in this form should it be 
more convenient to the cost analyst but, as mentioned earlier, there 
are good reasons not to use excessive detail even if it is readily 
available: Expense increases and accuracy is unlikely to improve. 
Theoretical considerations aside, estimating techniques must be 
based on whatever resource data the analyst can find, and in the past 
the availability of data has varied from one kind of equipment to an- 
other. To illustrate, aircraft airframe estimating procedures tend to 
be different from those developed for other types of equipment. An 


airframe model may contain all of the following categories: 


e Initial and sustaining engineering. 


Flight test operations. 


Initial and sustaining tooling. 


e Manufacturing labor. 


Manufacturing material. 


Quality control. 


Such a list of cost categories is desirable for all hardware estima- 
ting, but because of data limitations, present procedures for engines 


often cover only two phases of the procurement cost, development and 
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production, and avionics procedures only one, procurement cost to the 


government. The CIR should expand these possibilities in the future. 


Physical and Performance Characteristics 


Information about the physical and performance characteristics of 
aircraft and missile and space systems is just as important as resource 
data. Data collection in this area can be time-consuming, particularly 
since it is not often clear in advance what data will be required. The 
goal, of course, is to obtain a list of those characteristics that best 
explain differences in cost. Weight is a commonly used explanatory 
variable, but weight alone is seldom enough; speed is almost always in- 
cluded as a second explanatory variable for aircraft airframes. One 


* 
estimating procedure for aircraft uses all of the following: 


e Maximum speed at optimal altitude. 

e Maximum speed at sea level. 

e Year of first delivery. 

e Total airframe weight. 

e Increase in airframe weight from unit 1 to unit 7. 
e Weight of installed equipment. 

e Engine weight. 


e Electronics complexity factor. 


In addition, the following characteristics were considered for inclu- 


sion as part of the estimating procedure, although they were not used: 


e Maximum rate of climb. 
e Maximum wing loading. 
e Empty weight. 

e Maximum altitude. 

e Design load factor. 

e Maximum range. 


e Maximum payload. 


* 
Planning Research Corporation, Methods of Estimating Fixed-wing 
Atrframe Costs, Vol. I (Revised), PRC R547A, Los Angeles, April 1967. 


16 EQUIPMENT COST ESTIMATING 


At the outset of a study undertaken to develop an estimating re- 
lationship for aircraft cost, the cost analyst would not know which of 
all these characteristics would provide the best explanation of vari- 
ations among the cost of different aircraft; he would of necessity try 
to be as comprehensive as possible. An analyst who is familiar with 
the type of hardware under study will have some idea of the most likely 
candidates, but he will generally consider more characteristics than 


will eventually be used. 


Program Data 


A third type of essential data is drawn from the development and 
production history of hardware items. The acceptance date of the item, 
the significant milestones in the development program, the production 
rates, and the occurrence of major and minor modifications in produc- 
tion--all such information can contribute to the development of cost- 
estimating relationships. The list of explanatory variables discussed 
in the previous section includes year of first delivery and increase 
in airframe weight from unit 1 to unit nm, information that would be 
included in the category program data. 

An airframe typically changes in weight during both development 
and production as a result of engineering changes. For example, the 


weight of the F-4D varied as follows: 


Cumulative Airframe 
Plane Number Unit Weight (lb) 
Pei, oie. ee a ROS 
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Since labor hours are commonly associated with weight to obtain hours- 
per-pound factors, it is important to obtain weights applicable to each 
production lot if airframe weights by unit are not available. 

The need for other kinds of program data will be clarified under 
the discussion on data adjustment. To cite one example here, the year 


in which expenditures occur must be known to adjust cost data for price 
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level changes. (This is the reason for at least one CIR submission an- 
nually.) A certain amount of program data cannot be specified with this 
degree of precision nor can the use of these data be foretold, but the 
information is important nonetheless. It is what might be called back- 
ground information--data on other activities in the contractor's plant 
at the time a particular hardware item is being built; unusual problems 
the contractor may be encountering; attempts to compress or stretch out 
the program; and inefficiencies that are noted. This information may 
be useful in explaining those factors that appear to be aberrations when 
the resource data are compared with those from other development and 
production programs. In addition, a history of a contractor's overhead, 
general and administrative costs, and labor rates is useful for analyz- 


ing and predicting costs. 


Data Adjustment 


To be useful to the cost analyst, data must be consistent and com- 
parable, and in most cases the data as collected are neither. Hence, 
before estimating procedures can be derived, an adjustment must be made 
for definitional differences, production quantity differences, yearly 
price changes, and so on. The more common adjustments are examined in 
this section. It is by no means an exhaustive treatment of the subject: 
The list of possible adjustments is long and many of them will apply 
only in a very small number of cases. Also, evidence on certain types 
of adjustments (for contractor efficiency, for contract type, for pro- 
gram stretch-out) consists largely of opinion rather than hard data. 
While the cost analyst may allude to such adjustments, the research 


necessary to treat them in some definitive way has not yet been done. 


Definitional Differences 


Different contractor accounting practices and make or buy arrange- 
ments are primary reasons why adjustment of the basic cost data is gen- 
erally necessary. Companies record their costs in different ways. Of- 


ten they are required to report costs to the government by categories 


18 EQUIPMENT COST ESTIMATING 


that differ from those used internally. Also, government reporting cat- 
egories change from time to time. Because of these definitional dif- 
ferences, one of the first steps in cost analysis is to state the def- 
inition that is being used and to adjust all data to this definition. 
With the inception of the CIR, a standard set of definitions for air- 
frames has been established for use throughout the DOD. A primary pur- 
pose of the CIR is to overcome the problem of definitional differences 
in hardware cost data. For the next few years, however, most data will 
antedate the CIR and some adjustment will be required. 

As an example of what may be expected, a cost analyst may be ex- 
amining data from a sample of ten hardware items and discover that the 
cost category Quality Control is missing for some of the earlier items. 
He may conclude that no quality control was exercised in the 1950s or 
that this function is included in another cost element. The latter 
assumption is correct. Traditionally, Quality Control was carried in 
the burden account, and it was only in the late 1950s that it began to 
appear (at the request of the DOD) as a separate element. Hence, to 
use cost data on equipment built prior to this change requires convert- 
ing a portion of overhead cost to Quality Control. 

A more current example involves Planning, which in the CIR defi- 
nition is included in Tooling. Planning consists of two components-- 
tool planning and production planning. A company may put the first in 
Tooling and the second in Manufacturing. Other practices are to include 
tool planning in Engineering, to put all planning in Manufacturing, or 
to include a portion in Overhead. 

Table 1 illustrates this problem more concretely. A slightly ab- 
breviated version of the CIR list of cost elements appears on the left; 
on the right, the cost elements used by a large aerospace company and 
the nonrecurring costs of a proposed airframe. The lists are differ- 
ent and, as shown by Table 2, a simple rearrangement of the contractor 
cost elements does not solve the adjustment problem. Four of the con- 
tractor cost elements remain: Developmental Material ($2.6 million), 
Outside Production ($70 thousand), Other Direct Charges ($2.7 million), 
and Manufacturing Overhead ($28.94 million). These are not trivial ad- 


justments: These four elements can amount to well over half the total 
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cost of a large production contract. Developmental Material presumably 
would be split between Engineering Material and Manufacturing Material; 
Other Direct Charges would have to be allocated among Engineering, Tool- 
ing, Quality Control, and Manufacturing; and part of Manufacturing Over- 
head would be apportioned to Tooling Overhead and Quality Control Over- 
head. In each of these instances, the contractor who furnished the CIR 


information would be able to make the necessary adjustments from his own 


Table 1 


ILLUSTRATIVE COMPARISON OF CIR AND AIRFRAME CONTRACTOR COST ELEMENTS 


Airframe Contractor 


CIR Nonrecurring costs 
Cost Element Cost Element ($ thousands) 
Engineering Engineering .....0. aistahersreiel's axaleleters 8,600 
Direct labor 
Overhead Manufacturing 
Material Developmental 
Other direct adieect Taber aaisae ks. Pe 2,500 
charges Tooling direct 
PAD OEe Bale nyajwiaveoiaiera/ ooicnsliee are evokes al OOO 
Tooling Production direct 
Direct labor PAU Cig ropes niviiey eyes ohchagats)/s)'e/'a:'so\e-sw.ecave 850 
Overhead Developmental 
Materials and MESA, sor eleia a aigrenalte rs aaie @ Statens 2,600 
purchased tools TOORING MBESCEIAL joisicis eicicisrasreside 2,600 
Other direct Production material. «..065.6% avee 500 
charges Purchased equipment « .....0.0650 5 
Qutside productszON: wis ss es ss 70 


Quality Control 
Direct labor 


Overhead EMS POC CA ON Fees fics iaietels (ofsie iss sisiesserae 620 
Other direct 
charges Other Direct (Gharicesi.c.s's os ss ces 2,700 
Manufacturing Overhead 
Direct labor PRPAN SEE ENS: iasj cued Spo cee sie sieves am  HOL2Z00 
Overhead ManUEaCEUrIng 2). cclcsistiscsccsaien (2OR940 


Materials and 
purchased parts 

Other direct 
charges 


Purchased Equipment 


Material Overhead 
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accounting records. Outside Production costs, although small in this 
example, may constitute 30 to 40 percent of the total cost of an air- 
frame in some cases. When this happens, the labor hours and material 
costs incurred by the prime contractor fall far short of the total re- 


quired to build an airplane; a method of arriving at a total must be 


Table 2 


AIRFRAME CONTRACTOR COST ELEMENTS ARRANGED IN CIR FORMAT 


Airframe Contractor 


CIR Nonrecurring Costs 
Cost Element Cost Element ($ thousands) 


Engineering 
Direct labor Engineering 8,600 
Overhead Engineering overhead 10,200 
Material == 
Other direct 
charges a 


Tooling 
Direct labor Tooling direct labor 11,600 
Overhead -- 
Materials and pur- 
chased tools Tooling material 2,600 
Other direct 
charges -- 


Quality control 


Direct labor Inspection 620 
Overhead -- 
Other direct 
charges -— 
Manufacturing 
Direct labor Developmental 
direct labor 2,500 
Production 
direct labor 850 
Overhead -- 
Materials and pur- 
chased parts Production material 500 
Other direct 
charges == 
Purchased equipment Purchased equipment 5 


Material overhead -- 
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devised to permit the data to be analyzed on a comparable basis, i.e., 
on an equivalent 100-percent inplant basis. Ordinarily, the contrac- 
tor would have a detailed breakout of costs only for subcontractors on 
cost-reimbursable contracts, and other Outside Production costs would 
have to be allocated to the specified categories. Production labor 
hours incurred out of plant, for example, are often estimated on the 
basis of the weight of that portion of the airframe being built out of 
plant. In using historical data, the analyst may be in a similar posi- 
tion: When the amounts involved are large, he should be guided by what- 


ever information the contractor can provide. 


Physical and Performance Considerations 


A problem that resembles the one discussed above is the need for 
consistency in definitions of physical and performance characteristics. 
For example, speed can be defined in many ways--maximum speed at opti- 
mal altitude, true speed, equivalent speed, indicated speed. All of 
these defining terms differ in exact meaning and value. The weight of 
an aircraft or missile depends on what is included. Gross weight, 
empty weight, and airframe unit weight apply to aircraft, but each of 
these terms also differs in exact meaning and value. Some agencies in- 
clude sweep volume in their definition of the physical volume of an air- 
craft fire control system; others exclude it. Differences such as these 
can lead an analyst unfamiliar with the equipment to use inconsistent 
or varying values inadvertently. When data are being collected from a 
variety of sources, an understanding of the terms used to describe phys- 
ical and performance characteristics is at least as important as an 


understanding of the content of the various cost elements. 


Nonrecurring and Recurring Costs 


Another problem that involves questions of definition concerns 
nonrecurring and recurring costs. Recurring costs are a function of 
the number of items produced; nonrecurring costs are not. Thus, for 
estimating purposes it is useful to distinguish between the two, and 


the CIR provides for this distinction. Unfortunately, historical cost 


22 EQUIPMENT COST ESTIMATING 


data frequently show such cost elements as nonrecurring and recurring 
engineering hours as an accumulated item in the initial contract. Var- 
ious analytical techniques have been developed for dividing the total 
into its two components synthetically, but it is not clear at this time 
whether the nonrecurring costs that are obtained by ex post facto meth- 
ods will be comparable with those reported in the CIR. The CIR instruc- 


tions state: 


it is preferable to identify the point of segregation be- 

tween nonrecurring and recurring engineering costs as a 

specific event or point in time. Ideally, the event used 

would be the point at which "design freeze" takes place as 

a result of a formal test or inspection, and after which 

formal Engineering Change Proposal (ECP) procedures must 

be followed to change design. If no reasonable event can 

be specified for this purpose, then all engineering costs 

incurred up to the date of 90 percent engineering drawing 

release may be used.* 

Although it would be premature to consider the kinds of adjustments 
needed before a body of CIR data exists, splicing historical data to 
CIR data may also involve adjustments. 

A more subtle problem arises when nonrecurring costs on one prod- 
uct are combined with recurring costs on another, i.e., when the con- 
tractor is allowed to fund development work on new products by charging 
it off as an operating expense against current production. This prac- 
tice is especially prevalent in the aircraft engine industry. Separa- 
tion of the nonrecurring and recurring costs means an adjustment of 
the production costs shown in contract or audit documents to exclude 
any amortization of development. The nonrecurring expense that has 
been amortized can then be attributed to the item for which it was in- 
curred. Such an adjustment can only be accomplished in cooperation 
with the accounting department of the companies that are involved. It 
would not be necessary, of course, for equipment on which CIR data are 


available. 


* 

U.S. Department of Defense, Cost Information Report (CIR) for 
Atreraft, Misstle, and Space Systems, Budget Bureau No. 22-R260, Wash- 
ington, D.C: Aprid 21, 1966, ip. 7°43. 


DATA COLLECTION AND ADJUSTMENT 23 


Price-level Changes 


Figure 1 shows the change in average hourly earnings of production 
workers on manufacturing payrolls from 1920 to 1965. Although these 
earnings declined slightly during the early 1920s and again during the 
Depression, the trend has been steadily upward since 1934. The hourly 
wage rate has increased by a factor of 4.75 over a 45-year period; in 
other words, a manufacturer paid $4.75 for labor in 1965 that would 
have cost him $1.00 in 1920. The implication for equipment cost is 
clear. If the labor component of an automobile cost $500 in 1920, the 
cost for the same car today would be something over $2000; however, the 
hours required in 1965 would be less because of increased productivity. 

The relevance of these observations to the subject of data adjust- 
ment is that the manufacturing date of the different hardware items in 
a sample are normally spread over a period perhaps as long as ten to 
fifteen years. To compare a missile built in 1955 when labor cost about 


$2.35 per hour with a missile built ten years later when the labor rate 
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Fig. 1--Change in hourly earnings 
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had increased to over $3.35 per hour, requires that the labor cost of 
both be adjusted to a common base. (This problem is obviated by deal- 
ing in hours rather than dollars, but an adjustment would still be 
needed for raw material and purchased parts.) Adjustments are made by 
means of a price index constructed from a time-series of data in which 
one year is selected as the base and the value for that year expressed 
as 100. The other years are then expressed as percentages of this base. 
The hourly earnings from 1950 to 1960 for production workers could be 
converted to an index using any of the years as the base; in Table 3, 


1950 and 1960 have both been used as base years. 


Table 3 


AVERAGE HOURLY EARNINGS INDEX 


Average 
Hourly Index with Index wtth 

Earnings 1950 as 1960 as 
Year ($) Base Year Base Year 
1950 1.44 100 64 
LOS1. i500 108 69 
1952 1,65 115 73 
1953 1.74 121 77 
1954 1.78 124 79 
1955 1.86 L295 82 
1956 9D 135 86 
LOD7, 2:05, 142 on 
1958 2.41 147 | 
1959 2.19 152 97 
1960 2.26 157 100 


SOURCE: U.S. Department of Labor, Employment 
and Earnings Statistics for the United States, 
1909-66, Bulletin No. 11312-4, Washington, D.C., 
October 1966. 


The information needed to construct a labor index is available in 
the Bureau of Labor Statistics (BLS) monthly publication Employment and 
Earnings, and Table 4 presents indexes based on this source. Changes in 
materials costs are available in another BLS monthly publication, Whole- 
sale Prices and Price Indexes. These indexes can be used to develop a 


materials price index for a given type of equipment by selecting from 
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Table 4 


LABOR PRICE INDEXES 


Aircraft 
Engtnes Other Motor Electrical 
and Atreraft Vehicles Equtpment Shtp 
Engine Parts and and and and Boat 
Year Atreraft Parts Equipment Equipment Supplies Building 
1952 £59 .61 na” .61 64 .62 
1953 -63 a63) na -64 -67 -67 
1954 -66 -65 na - 66 -69 -68 
1955 -69 -67 na -69 ek 270 
1956 ae wil na mea) sO 74 
1957 oto ane na -74 ae 779 
1958 -80 oto 79 «76 - 82 82 
1959 -84 -83 -83 -81 -85 .85 
1960 .86 -86 - 86 -84 . 88 -88 
1961 -88 -89 .88 - 86 91 93) 
1962 91 .92 oo -90 293 -95 
1963 .94 94 94 93 95 .99 
1964 95 97 .97 -96 397 1.00 
1965 1.00 1.00 1.00 1.00 1.00 1.00 


"Not available for years prior to 1958. For the years 1952-1957, 
the labor price index for aircraft should be used. 


the commodity groups in the Wholesale Price Index a list of materials 
representative of those used in constructing the equipment; these mate- 
rials are then weighted according to estimates of the value of each 

in fabricating the equipment. A composite aircraft raw-materials in- 


dex might be based on the following materials and weights: 


Finished -steel five cs ccs wees POs 
Stainless steel sheet ....... -04 
DUCA SPONRS fre oiaia js 40s ey0's!'s -07 
ATG SHECE ..cetececces ce 29 
ALGminiun BOO) esses caidectare ala 
Aluminum extrusions ......... 620 
Wire end! Cable ox 666 ieis Ses wise ban 
Rivets, muts, Wbohts. 2. as sce L5 


For any given year a price index for each of these is obtained and a 
composite index constructed by summing the individual index numbers 


multiplied by the weightings as shown in Table 5. Weights in an index 
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Table 5 


AIRCRAFT RAW-MATERIALS INDEX 


1967 Index Index 
Commodity Number*® Weight Number x Weight 

Finished steel 105.8 -02 2. L2 
Stainless steel sheet 108.0 -04 eo 32 
Titanium sponge 60.3 -07 22 
Aluminum sheet 99.8 .29 28.94 
Aluminum rod 110.4 sel LP Ay 
Aluminum extrusions FO «20 HDS YR 
Wire and cable 126.0 <12 2 
Rivets, nuts, bolts ESB ie -15 19.98 
Conpositteindex number. << 2) cist «, & « « Bt «  LO1<96 


71957-1959 = 100. 


need to be updated from time to time to reflect changing technology; 

it may be that those shown in Table 5 are applicable only to current 
aircraft. Table 5 merely illustrates the principle of deriving a com- 
posite index; the reader who wishes to pursue the matter will find in- 
dex numbers discussed in textbooks on economic phemitetded 2 Another 
type of composite index is used in those instances in which labor and 
material costs cannot be separated and the price-level adjustment has 
to be made to the total cost of an engine, airframe, or missile. Such 
an index can be derived in the manner illustrated in Table 4 with the 
labor and material elements weighted according to the pattern that has 
been found to exist in the past (e.g., labor, 80 percent; materials, 

20 percent). Overhead, which is a mixture of indirect labor, materials, 
and items such as rent, utilities, taxes, and fringe benefits, is ad- 
justed in most cases by the same percentage as direct labor. To decide 
whether a different adjustment factor should be used, it would be nec- 


essary to examine each of these components. 


"Niles, for example, W. A. Spurr, L. S. Kellogg, and J. H. Smith, 
Bustness and Economic Statistics, rev. ed., Richard D. Irwin, Inc., 
Homewood, Illinois, 1961. It is important to recognize the differences 
in indexes that may result from weighting by base year or a given year, 
i.e., Laspeyres' or Paasche's index. These are also discussed in text- 
books on economic statistics. 
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The adjustment of costs for yearly price changes is not always as 
straightforward as the foregoing discussion may imply. One problem is 
that price indexes are inherently inexact and their use, while neces- 
sary, can introduce errors into the data. The average hourly earnings 
for all aircraft production workers may increase by $.05 in a given 
year, but at any particular company they will increase more or less 
than that amount. Use of the average number to adjust the data for a 
given company may bias the data up or down. Also, for many specialized 
items of equipment, a good published price index does not exist. In 
fact, the usual indexes are oriented toward the civilian economy and 
may be misleading, i.e., they may understate the change experienced in 
defense and space industries. The United States, with many other coun- 
tries, furnishes the Office of Economic Cooperation and Development 
in Paris with an index applicable to government defense expenditures 
in general. This index, shown in Table 6 for 1952-1964, is a useful 
reference when detailed index numbers seem questionable or are non- 


existent. 


Table 6 


DEFENSE EXPENDITURES INDEX, 1952-1964 


Index Index 
Year Number Year Nunber 
1952 84 1959 102 
1953 83 1960 104 
1954 84 1961 105 
1955 88 1962 106 
1956 93 1963 108 
1957 97 1964 113 
1958 100 


Another problem is that of identifying the years in which expendi- 
tures occur when the only data available show total contract cost. Pro- 
duction and cash flow may have been spread out over a period of several 
years, and in principle the costs should be adjusted for each year sep- 
arately. Although the CIR will provide the information needed to do 
this in the future, this information may be unavailable today and some 


reasonable approximation of the expenditure pattern must suffice. 
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One method of obtaining this approximation is to use a percent-of- 
cost versus percent-of-time curve of the type illustrated in Fig. 2. 
These curves are developed from historical data on a number of programs 
involving the same kind of hardware--large ballistic missiles in this 
case--and can be used to break total research and development or total 
production cost into annual expenditures. For example, to determine 
the annual expenditures in a five-year R&D program amounting to a total 
of $50 million the following percentages would be obtained from the R&D 


curve of Fig. 2: 


Time Expenditures 


20 Die 
40 23.0 
60 65.0 
80 92.0 
100 100.0 


These percentages are cumulative, of course, so the annual percentages 


and the amount they represent would be: 


Expenditures 
Year Percent  $ Milltons 
1 5S Zh D 
2 LD 8.75 
3 42.0 21.00 
4 27.0 13,50 
5 8.0 4.00 


In the production phase, a technique that can be used is to de- 
velop lag factors by examining delivery schedules and production lead 
times. Costs are then lagged behind delivery dates by some reasonable 
factor. 

A more fundamental question than any of those raised above is 
whether yearly price changes should be made at all. It is sometimes 
argued that the upward trend in wage rates has been accompanied by a 
parallel trend in the output per employee or productivity rate. This 
argument implies that there has been little change in the real costs 
of aerospace equipment because increases in wages and materials cost 
have been offset by a decrease in the number of employees required 
per dollar of output. However, the real dollar output per man is dif- 


ficult to measure in an industry in which continual change rather than 
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Fig. 2--Percent-of-cost versus percent-of-time curves 


standardization is the rule. Certainly the growth in productivity is 
not uniform for aircraft, missiles, ships, and tanks, and to develop a 
productivity index for each would be a difficult and contentious task. 
Present practice, therefore, is to apply the price-level adjustment fac- 
tors to obtain constant dollars and, at the same time, to remain alert 
to inequities that may be introduced by following this procedure. As 

an illustration of the significance of price-level adjustments, Fig. 3 
shows the effect of adjusting production costs incurred over the pe- 


riod 1959-1965 (open circles) to 1962 dollars (closed circles). Both 


Unit production cost ($ thousands) 
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Fig. 3--Effeet of adjustment for price-level changes 


the level of cost and the slope of the curve change as a result of the 
price-level adjustment. (In this example a crossover occurs because 


the year 1962 has been selected as a base for adjustment.) 


Cost-quantity Adjustments 


The cost-quantity relationship, discussed at length in Sec. V, 
is usually known in the aerospace industry as the learning curve. The 
cost-quantity relationship may be defined in brief as follows: Each 
time that the total quantity of items produced doubles, the cost per 
item is reduced to some constant percentage of its previous value. 
Whether or not this particular formulation is accepted, the fact re- 
mains that, for most production processes, costs are invariably a 
function of quantity: As the number of items produced increases, cost 
normally decreases. Thus, in speaking of cost, it is essential that a 


given quantity be associated with that cost. An equipment item can be 
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said to cost $100,000, $80,000, $64,000, or $51,200, and all of these 
numbers will be correct. 

Which cost should be used by the cost analyst? The answer will de- 
pend on a number of factors; if his purpose is to compare one missile 
with another, the cumulative quantity must be the same for both mis- 
siles. The adjustment to a specific quantity is a simple matter if the 
slope of the learning curve is known or if it can be inferred from the 


data. Take, for example, the costs for three missiles: 


Missile Unit Number Cost/Unit ($) 


A 50 1000 
B 100 1000 
C 200 1000 


Although the cost is the same for each, the number of units is differ- 
ent. Thus, for a cost comparison, the units must be adjusted to a com- 
mon quantity. If 100 is chosen and an 80-percent learning curve assumed 


for all three missiles, the adjusted costs will be as follows: 


Missile Unit Number Cost/Unit ($) 


A 100 800 
B 100 1000 
C 100 1250 


To project labor requirements for the 100th unit when only 50 units have 
been produced is somewhat uncertain, but to ignore the cost-quantity re- 
lationship will in most instances result in greater error than such a 
projection introduces. (The learning curve is most frequently depicted 


as a straight line on logarithmic scales as shown in Fig. 3.) 


Other Possible Cost Adjustments 


The lack of a way to adjust cost data for productivity changes 
over time is illustrative of the current situation in which more kinds 
of cost adjustments have been theorized than have been quantified. 

For example, it has been suggested that adjustment may be required be- 
cause of differences in contract type (fixed-price, fixed-price-incen- 
tive, cost-plus-fixed-fee contracts) or differences in the type of 


procurement (competitive bidding or sole source). The hypothesis is 
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that the type of contract or procurement procedure will bias costs up 
or down, but this hypothesis is difficult to substantiate. 

Another question concerns manufacturing techniques. What are the 
effects of varying amounts of capital investment or capital improve- 
ment and of changes in manufacturing state of the art? A related ques- 
tion concerns the efficiency of the contractor. It may be surmised 
that Contractor A has been a lower cost producer than Contractor B on 
similar items, but this is extremely difficult to prove. A low-cost 
producer may be one who, because of his geographical location, pays 
lower labor rates. Contractors in Fort Worth, Texas, and in Atlanta, 
Georgia, may have a considerable advantage in this regard over their 
competitors in Los Angeles and San Francisco, California, and in Seattle, 
Washington. Table 7 does not give a fair picture of comparative rates 
because differences among industries in the various cities tend to be 
more important than differences in location. But, for two cities as 
close together as Los Angeles and San Francisco, labor rates differ by 
10 percent. Thus, although it might not be possible to adjust cost data 
on the basis of contractor efficiency, adjustments can be made for 


differences in location by using the specific area labor rates. 


Table 7 


AVERAGE HOURLY EARNINGS OF PRODUCTION WORKERS 
ON MANUFACTURING PAYROLLS, NOVEMBER 1965 
(in dollars) 


Atanas. cs clevaleng sexs scuba tarears.e aha rabatess Rete n state Leo 
BOSEOHE ih ists sR Se ee ete POS Se seis 2.69 
CRECEBD ois ess sees ee ee Fae nh gees, sak 2.91 
DEELORE” fog ereiersnacete aie tenieie as b apolar Heese seed 
Los Angeles ........ diols sia. ee autnee oicks eteks| oS. OF, 
New (ORECANG: is isis alercieela.e e's Bwavensleraeuevensielers 2.72 
NGC cE betel: <b lore nS isha ce) 0 idee, SS 22 2563 
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SOURCE: U.S. Department of Labor, Bureau 
of Labor Statistics, Employment and Earnings, 
Washington, D.C., January 1966. 


III. STATISTICAL METHODS IN DEVELOPMENT OF 
ESTIMATING RELATIONSHIPS 


MANY ESTIMATING RELATIONSHIPS are simple statements that indicate that 
the cost of a commodity is directly proportional to the weight, area, 
volume, or other physical characteristic of that commodity. These 
estimating relationships are simple averages; they are useful in a vari- 
ety of situations and, because of their simp}icity, they require little 
explanation. In this section, the statistical considerations involved 
in developing cost-estimating relationships for advanced equipment are 
examined. The emphasis is on the derivation of more complex relation- 
ships, i.e., equations that are able to reflect the influence on cost 
of more than one variable. The intent is to illustrate a general ap- 
proach to the development of such relationships and to introduce basic 
concepts of statistical analysis. The emphasis is not on statistics 
per se; the basic statistical theory as well as the computational as- 
pects involved in developing these relationships are included only to 
clarify practical considerations. Statistical analysis can help pro- 
vide an understanding of factors that influence cost, but estimating 
relationships are no substitute for understanding; regression analysis, 
which will be discussed in this study, does not offer a quick and easy 
solution to all the problems of estimating cost. 

The outstanding characteristic of a cost factor is that the rela- 
tionship between cost and the explanatory variable is direct and ob- 
vious; thus, cost per pound is widely used because of the generally 
satisfying thesis that as a ship, tank, or airplane increases in weight 


it becomes more costly. Weight changes alone do not always adequately 
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explain cost changes, however, and additional explanatory variables are 
often needed. The problem is to find these variables and their rela- 
tionship to cost. The procedure is to decide what variables are log- 
ically or theoretically related to cost and then to look for patterns 
in the data that suggest a relationship between cost and the variables. 
Table 1 contains a set of data on cost and selected variables that can 
be analyzed for such patterns. The costs of ten airborne radio commu- 
nication sets are given with the weight, power output, and frequency 
of each. It is to be expected that cost would increase with weight or 
with power output. Frequency is also included because in the past 
higher and higher frequencies have been sought to increase communica- 
tion capacity and, for a given power output, higher frequency sets 
have been more costly. 

A graphic analysis of the data in Table 1 shows that cost is not 
a simple linear function of any of the three explanatory variables. 
Cost tends to increase with weight, but there are notable exceptions 
to the trend, as illustrated by the scatter diagram of Fig. 1. Cost 
plotted against power output as shown in Fig 2 is even less promising, 
partly because the arithmetic scale does not enable an observer to dis- 
tinguish among the points between .5 and 30 watts. The change from an 
arithmetic to a logarithmic scale shown in Fig. 3 spreads the points in 


the low-power range and indicates that a trend may exist, but with a 


very wide scatter. 


Table 1 


TEN AIRBORNE RADIO COMMUNICATION SETS 


Cost Wetght Power Output Frequency 


($) (Lb) (w) (MHz) 
22,200 90 20 400 
17,300 161 400 30 
11,800 40 30 400 

9,600 108 10 400 
8,800 82 10 400 
7,600 135 100 25 
6,800 59 6 400 
3,200 68 8 156 
1,700 25 42 


8 
1,600 24 0.5 258 
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Fig. 1--Scatter diagram of cost versus weight for sample data 
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Fig. 2--Seatter diagram of cost versus power output for sample data 
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Fig. 3--Seatter diagram on logarithmic grid of cost versus power 
output for sample data 


The wide scatter in Fig. 3 is explained in part by recognizing the 
effect of frequency. In Fig. 4, each point is identified by frequency 
class: High Frequency (HF), up to 30 MHz; Very High Frequency (VHF), 

30 to 300 MHz; and Ultra High Frequency (UHF), above 300 MHz. A clearer 
relationship exists between cost and power output within each frequency 
class than exists for the whole sample scattered without regard to fre- 
quency. This suggests that the sample is not homogeneous. Each fre- 
quency band may constitute a separate sample, or possibly HF and VHF 
costs are on one level and UHF costs are on another. 

At this point, it is not clear if any of the explanatory variables, 
either singly or in combination, will yield a useful estimating relation- 
ship, or if a single relationship can serve for all frequencies. To 
illustrate techniques that are commonly employed in deriving estimating 


relationships, assume that cost can be related to a single predictive 


Cost ($ thousands) 
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Fig. 4--Identifieation by frequency class 


variable--that of weight. The results of a linear normal simple regres- 
sion model will then be examined. Later, several variables in a multi- 
ple regression analysis will be considered, and the problem of the ap- 
parent nonhomogeneous character of the sample illustrated in Fig. 4 

will be reexamined. 

Regression has become a widely accepted tool for cost analysis, 
and it is frequently used to develop estimating relationships. The 
technique of regression analysis can be thought of as consisting of two 
distinct stages. The first is that of estimating the constant and co- 
efficients of the equation, and the second is that of inferring the re- 
liability and significance of the results of the estimate on the basis 
of assumed (and to a degree verifiable) properties possessed by the 
data and the results. Regression analysis as a technique is applicable 


only to the two stages performed together. Estimating coefficients or 
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curve fitting is simply a mathematical exercise. Only when these esti- 
mating procedures are used as a basis for making statistical inferences 


can they be viewed as part of a regression analysis. 


Simple Linear Regression 


The form of the relationships between cost and the explanatory 
variable(s) depends on the problem. It may reflect either an under- 
lying physical law or a structural relationship. When no particular 
functional form is suspected, a simple (two-variable) linear model is 
frequently used to describe the relationship between two variables. 


In this case, the equation of the model is 
y=art ba, (1) 


where y is the dependent variable and x is the explanatory variable. 

The symbols a and b are the constant and coefficient, respectively, of 
the equation estimated from the data. Here y could represent the cost 
of a radio communication set and x could represent the weight. If it 

is assumed that b is greater than zero, the model indicates that heavier 
equipment will cost more than lighter equipment. When the values of a 
and b are known, it is possible to compute y (cost) for any given value 


of x (weight). 


Least-squares Estimating 


Given Eq. (1), the basic problem in the first phase of the regres- 
sion analysis is to derive estimates of the parameters qa and b. The 
standard procedure is the method of least-squares. The values of a 
and b are determined by the requirement that the sum of the squares of 
the deviations of the sample observations from the estimated line will 
be at a minimum. Symbolically, this minimum is expressed as 

n 


; ae ee 2 
min oh Y; y;) “ (2) 
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where Y; is the ith observation and y, is the value of Y; estimated 
from the equation 


Y, =at be. (3) 


The carets over @ and b indicate that a and } are least-squares 
estimates of the true but unknown values of a and b. Thus y is the 
least-squares estimate of Y5 and the term Y; - y,) indicates the dif- 
ference between each observed Y; and between each corresponding esti- 
mated value yy This is illustrated in Fig. 5, which shows the actual 
(y) and estimated (y) value of the dependent variable that corresponds 
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Fig. 5--Devtation of actual value from estimated value and sample mean 
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to a specific value of the explanatory variable x. The line shown in 
Fig. 5 is the line that represents Eq. (3). All of the estimated val- 
ues of y, fall on this line. The vertical distance from point A to 
point B is the difference between the actual value (y) and the estima- 
ted value (y). The summation of all such differences that are squared 
(as illustrated in Eq. (2)) is the quantity to be minimized in estima- 
ting the line. 

The minimum value for this sum is satisfied by substituting Eq. 
(3) in Eq. (2), taking the partial derivatives of Eq. (2) with respect 
to a and b, and setting the results equal to zero. This process yields 
two equations that are called normal equations and that can be solved 


for a and b: 


na+b J x, 


by 


) xy 


ayetby) x, 


where y = cost of airborne radio equipment in thousands of dollars, 
«x = weight of airborne radio equipment in pounds, 
nm = number of items in the sample, 
x = summation (e.g., 2 y = the sum of all y's). 


Table 2 contains the numerical values and totals required to solve the 


Table 2 


DATA FOR REGRESSION ANALYSIS OF COST AND WEIGHT 


x y x xy 

90 2242 8,100 1,998.0 
161 17.3 25,921 2, fo5.3 
40 11.8 1,600 472.0 
108 9.6 11,664 1,036.8 
82 8.8 6,724 721.6 
135 7.6 38,229 1,026.0 
59 6.8 3,481 401.2 
68 3.2 4,624 217.6 
25 be 625 42.5 
24 1.6 576 38.4 
792 90.6 81,540 8,739.4 
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normal equations when data from Table 1 are used. The costs are ex- 
pressed in thousands of dollars. When the values from Table 2 are sub- 
stituted in the normal equations, the following expressions are obtained 


for the sample data points (n = 10): 


90 .6 10a + 792b, 


8739.4 792a + 81,540b. 


* 
Solved simultaneously, these equations give 


@ = 2.477, 
b = .083, 
and thus from Eq. (3) 
y = 2.477 + .083x. (4) 


The line represented by this equation is shown in Fig. 6 as the 
solid line with the actual observations plotted as dots. The extent 
of the dispersion of the observations relates inversely to the useful- 
ness of the line as a tool for estimating the values of y from the 
values of x. The greater the dispersion of observed values of y about 
the line, the less accurate the estimates that are based on the line 
are likely to be. The measure of the dispersion about the regression 
line is called the standard error of estimate (SE) of the equation and 
is shown by the dashed lines. 

One measure of dispersion in a collection of data points is called 
the variance. The variance is defined as the sum of the squared dis- 
tances to each of the data points from a central reference point divided 
by the degrees of freedom (df), which equal the number of independent 


bits of information contained in the sample. (In analyzing the data 


* 

Slight variations may exist in the last significant figure in the 
examples throughout this section because of rounding and logarithmic 
transformations. 
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Fig. 6--Regresston line and standard error of estimate 


that are given in Table 1, the degrees of freedom equal (m - 2); i.e., 
the number of observations ” less the number of constraints, 1 each 
for a and b.) 

In least-squares procedures, the central point of reference for 
calculating the variance of each variable is its sample mean, which 
causes the least-squares line to have the property of passing through 
the means of the variables used to estimate the line. This characteris- 
tic is shown in Fig. 5; it can be verified by dividing both sides of the 


first normal equation by m, since the sample mean of any variable y is 
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Sere thy (5) 


By referring to Fig. 5, it can be seen that the total distance 
from ¥; to y for any observation on y is the distance from C to B. 
The sum of all such distances squared and divided by the degrees of 
freedom is called the total variance of y: 


Total variance of y = b aaa (6) 
The distance from C to A indicates the amount of the total deviation 
of y from y which is explained by the estimating relationship. Conse- 
quently, the sum of the distances from y to the line, squared and di- 
vided by the degrees of freedom, is called the explained variance: 
“ -,2 
(Yy; + Hy 
n-2 


: (7) 


Explained variance of y = ) 


The remaining distance from A to B is the residual or unexplained de- 
viation from y, to y, or the unexplained variance: 


Ene 
(y; i y;) 
n-2 


: (8) 


Unexplained variance of y = ) 


The standard error of estimate is defined as the square root of the 


unexplained variance of the y's: 


(9) 





For the equation y = 2.477 + .083x, the standard error of estimate 
is $5,808. This value has been plotted above and below the regression 
line in Fig. 6. The interpretation and significance of these results 
will be discussed in connection with the use of prediction intervals. 


In comparing one SE with another, it is useful to compute a relative 
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standard error of estimate. One such measure is the coefficient of var- 


iation (CV), which relates the SE to the mean of the sample y's: 


CV = ; (10) 


ste 


Continuing the analysis of the data in Table 1, the mean of the y's is 
$9,060. Therefore, the value of CV is 


Soe 
$9,060 


-641. 
This value is high. Although the question of reliability of an estima- 
ting equation is relative to the context in which the equation is to be 
used, a value at least as small as 10 to 20 percent for the coefficient 
of variation is desirable. 

The standard error of estimate gives a measure of the magnitude of 
the unexplained variance. Another related measure of dispersion is 
given by the coefficient of determination that shows the proportion of 


total variance accounted for by the estimating relationship: 


pot : 3 Explained riance 
rv? = Coefficient of determination eee eee 
Total variance 


ae Sie Unexplained variance (11) 


Total variance 


When all the observed points in the sample are on the least-squares 
line, the coefficient of determination equals 1 and there is no unex- 
plained or residual variance. As the proportion of total variance that 
remains unexplained increases, the coefficient of determination ap- 
proaches zero. The square root of the coefficient of determination is 
called the correlation coefficient.” Correlation has no substantive 


Fs 
Since total variance, Eq. (6), and the standard error, Eq. (9), 


have been adjusted for degrees of freedom, the resulting correlation 
coefficient, the square root of Eq. (11), is also adjusted. Some com- 
puter programs do not adjust; the variance figures are then biased down- 
ward and the correlation coefficient will appear larger than in the 
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meaning unless both the dependent and explanatory variables are assumed 
to be normal random variables. The ordinary assumption in using regres- 
sion analysis for developing estimating relationships is that only the 
dependent variable is random. Consequently, it is not considered good 
practice for the correlation coefficient to be used in documenting the 
results in this particular application of regression analysis. The 
inclusion of the correlation coefficient, however, causes no serious 
problem since it is simply the square root of the coefficient of deter- 
Mination. When analysts review the results, they can easily calculate 
the latter from the former. ‘Since the coefficient of determination is 
always in the range between zero and one, its square root will always 
be larger, except at the boundary points of zero and one. 

The coefficient of determination for Eq. (4) is .325, which is 
relatively low and further substantiates the evidence that weight alone 
is not a good predictor of the cost of airborne radio communication 


equipment. 


Statistical Inference 


The standard error of estimate, the coefficient of variation, and 
the coefficient of determination indicate the degree of accuracy with 
which the estimating equation describes the sample observations. How- 
ever, the analyst is primarily interested in using the estimating equa- 
tion to predict costs among the population of items that the sample 
represents; the standard error of estimate and the coefficient of de- 
termination do not furnish a good measure of the reliability of the 
estimating equation for predictive purposes. 

The problem of reliability raises other considerations. First, 


the question arises whether x and y are actually related in the manner 


unadjusted case. The practical implications of these adjustments is 
minimal except in extremely small sample cases. However, to fully 
understand the results, the analyst should know whether the total var- 
iance, standard error, and correlation coefficient are adjusted in any 
particular program or set of results. A discussion of adjustments for 
degrees of freedom is given in M. J. B. Ezekiel and K. A. Fox, Methods 
of Correlatton and Regresston Analysts, 3d ed., John Wiley & Sons, 
Inc., New York, 1959, pp. 300-305. 
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indicated by the regression equation. A particular sample could show 
such a relationship out of pure chance when, in fact, none exists. Sec- 
ond, the regression equation obtained from the sample is one of a family 
that could be obtained from different samples within the same popula- 
tion. Finally, when the equation is used to estimate a value for y 
based on an x that is outside the range of the sample, the reliability 
of the estimate of y may be suspect because the estimated relationship 
may not hold beyond the sample range or because the x is a point from 

a different population rather than an extrapolation from the sample. 

An example of an extrapolation for which the relationship might not 

hold is that of an aircraft that is much larger than any in the sample. 
The problem of moving to a new population appears in a case in which 

an aircraft is to be constructed of titanium when the sample contains 
only aluminum aircraft. In the latter case, if a substitution of tita- 
nium for aluminum is expected to increase the cost, the estimating rela- 
tionship developed from the aluminum sample may be used by an experi- 
enced analyst as an approximate indicator of the lower bound; however, 
adjustments based on such personal judgments are not a part of statisti- 
cal theory. 

Statistical inference may be used to answer the two questions that 
arise in connection with the problem of reliability. To decide whether 
xz and y are actually related, test for statistical significance; to 
evaluate predictions, establish a prediction interval for the regres- 
sion line. However, certain assumptions and conditions must be met 
before standard techniques of statistical inference and testing can be 
validly applied to least-squares results; namely, the data are assumed 
to be a sample taken from a larger population, which meet the following 


conditions: 


1. The x values are nonrandom (fixed) variables. 

2. The residual deviations are independent random variables 
with normal distributions. 

3. The expected value of the distribution of each of these 
random variables is zero, and the unknown variance is the 


same for all values of zx. 
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Under these assumptions, the hypothesized relationship between y and x 


becomes 
~=at + Uu. 12 
y, = 4 be Us (12) 
where 7 =(@L,.2% <3) 7s 
u; = the normally distributed random error terms with zero 


expected value and a common and unknown variance. 

Further, under these assumptions, the least-squares method produces un- 
biased maximum likelihood estimators. Standard statistical techniques 
can be applied to the least-squares results to test for significance 
and to make inferences about reliability and accuracy in a probabilis-— 
Etc deans A graphic illustration of these assumptions as they relate 
to the simple (two-variable) regression case is shown in Fig. 7. 

Although the subject of statistical testing is too complex to 
treat comprehensively here, the method of testing the significance of 
the relationship between x and y in the simple regression of Fig. 6 
will be examined briefly. Basically, the procedure involves establish- 
ing the null hypothesis that x and y are not related (i.e., that b = 0), 
and testing to determine whether the hypothesis should be rejected. 
The test that is commonly used for this purpose is known as the t-test 
because it uses the t-ratio, or ratio of a coefficient to its standard 


error. For this simple regression, the ratio is expressed as 


> (13) 


where b = the estimated regression coefficient (from the equation 


y =a +t bz), 


“I more comprehensive statement of these assumptions and considera- 
tions is given in W. A. Spurr and C. P. Bonini, Statistical Analysis 
for Business Dectstons, Richard D. Irwin, Inc., Homewood, Illinois, 
1967, pp. 564-565; A. M. Mood, Introduction to the Theory of Statistics, 
McGraw-Hill Book Company, Inc., New York, 1950, pp. 152-154; and John 
Johnson, Econometric Methods, McGraw-Hill Book Company, Inc., New York, 
1963, pp. 3-9. 
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oye the standard error of b, 


SE 





> 


x @, = z)? 
SE = the standard error of estimate as defined in Eq<a(9)% 
The value of ty for Eq. (4) is 1.96. 

A standard table of t-ratigs is required to use Eq. (13) to test 
the null hypothesis.” The relevant row is shown in Table 3. If the 
calculated value ty falls below the appropriate value of t selected 
from this table, the null hypothesis that b = 0 would be accepted, and 
it would be concluded that b is, in fact, not significantly different 
from zero. The level of significance above each of the t-values in- 
dicates the probability that the calculated value could be as, high 
strictly by chance as the values that are shown in the table. In other 
words, these levels of significance indicate the probability that the 


null hypothesis will be rejected when it is true. 


Table 3 


VALUES OF t-RATIOS FOR 8 DEGREES OF FREEDOM 
(One-sided Test) 






Level of Significance 





Degrees of 
Freedom 


t-Ratto 


1,108 | 1.397 | 1.860 | 2.306 


If there were evidence to justify the assumtion that the sign of 





the coefficient could be only positive (or only negative) if it were 
different from zero, the level of significance associated with each t 
, could be read directly from Table 3. However, the common practice in 


* 
All of the references in the Bibliography to this section contain 
t-tables. 
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Fig. 7--Simple linear population regresston model 


regression analysis is not to make this assumption, but to test as 
though the value of t (if it were different from zero) could be either 
positive or negative. Because of the symmetry of the distribution of 
the t-ratios, the level of significance for the two-sided test is twice 
the level of significance for the one-sided test. Thus, the levels of 
significance of the t-values shown in the table are only half the actual 
levels for the two-sided test. For example, the value 1.86 has a level 
of significance of .05. For the two-sided test, double this amount and 
read the level of significance as .10. In the two-sided test, the 
probability is 10 percent that the absolute value of ty is as large as 
1.86 when b is actually equal to zero. Since in the example ty = 1.96, 
if the required level of probability for rejecting the null hypothesis 


when it is true is as high as 10 percent but no higher, the hypothesis 
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that b = 0 is rejected, and the relationship is considered significant. 
On the other hand, if a .05 level of significance (¢ = 2.306) seems 
appropriate, the hypothesis must be accepted. In this case, the co- 
efficient of x, and therefore the equation, is considered as not 
significant; 

The question at this point is, What should the level of signif- 
icance be for rejecting the hypothesis? Unfortunately, no simple:an- 
swer is possible. The values of .10, .05, and .01 are those that are 
most commonly used, but the analyst must make a decision based on the 
risk that is assumed when a true hypothesis is rejected.” For the 
purpose of this discussion, we will accept a value of .10 in testing 
significance and in establishing a prediction interval for the regres- 


sion line. 


Prediction Intervals 


The procedure for calculating the prediction interval for a simple 
regression is as follows. For a given value of the explanatory var- 
liable, say x, the estimating equation is used to obtain a predicted 


value of the dependent variable: 
y =a + ba. (14) 
The prediction interval puts a boundary around y: 


YF Bepos (15) 


There is a certain level of confidence (1 - €) that the cost of a set 


weighing x will be in that interval. 


* 

A more comprehensive discussion of the use of statistical tests 
is given in W. A. Wallis and H. V. Roberts, Statistics, The Free Press, 
New York, 1963, pp. 399-402, 413-426. 


a 
For further discussion, see W. A. Spurr, L. S. Kellogg, and 
J. H. Smith, Business and Economie Statistics, Richard D. Irwin, Inc., 
Homewood, Illinois, 1961, pp. 251-255. 
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Values for e/2 rather than e are used since y is to be bounded on 
both sides. The values of e€ can be divided by two since under the as- 
sumptions, the probability distribution about y is normal and therefore 
is symmetrical. In statistical terminology, a two-tailed t distribu- 
tion for constructing the intervals is used. 

In the case of simple regression, a 100(1 - €)-percent prediction 
interval for an estimated value of the dependent variable can be con- 


structed as follows: 


He Boggs (16) 
where 


¢/2 (17) 





and where SE = the standard error of the estimating equation from which 
y was obtained, 
t/2 = the value obtained from a tabie of t-values for the e/2 
significance level, 
n = the size of the sample, 
x = the specified value of the explanatory variable used as 
a basis for obtaining Ys 
x = the mean of the x's in the sample, 
) (x, - x)? = the sum of the squared deviations of the sample x's from 
their sample mean. 
When the estimating equation derived previously is used, the cost 
of a communications set weighing 100 1b is estimated at $10,777. To 
establish around this value a 90-percent prediction interval (i.e., 


one with a 10-percent level of significance), the necessary data are 


SE = 5.808, 
é = 0.1, 
e/2 = 0.05, 
& = 1.86, 


n= 10, 
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df = 8, 

2 = .100.1b, . 
xz = 79.2 lb, 
2 


) (e - 2)~ = 18,813.6 1b. 


By substituting these data in Eq. (17), solving for Aey/2? and mul- 
tiplying by 1000, we obtain 


A./2 = $11,447. 


Therefore, for x = 100 1b, the 90-percent prediction intervals in dol- 


lars are 
y + A. jo = $10,777 * $11,447. 


The percentage 100(1 - €) is the confidence level of the prediction 
intervals, which means that if repeated observations on the cost of 
communications sets that weigh 100 1b were taken, 100(1 - €) percent 
of the time these observations would lie within the range set by the 
100(1 - €) prediction intervals. This is the only sense in which a 
level of confidence can be associated with prediction intervals. It 
is erroneous to infer that there is a 100(1 - €)-percent probability 
that the actual value for any particular case will lie within the in- 
terval. 

Further, prediction intervals are valid outside the range encom- 
passed by the sample data that are used to generate the estimating re- 
lationship and the interval only if the estimating relationship is it- 
self valid outside that range. For example, if there were occasion 
for the line to curve up or down or if a discontinuity in the form of 
a discrete jump in cost occurred for weights outside the sample range, 
this fact would not be reflected in the prediction interval. Thus, it 
must be clearly indicated when the intervals are used for estimates 
based on values outside the sample range. 

This prediction interval procedure can be repeated for other val- 


ues of x and the results plotted to obtain a 90-percent prediction 
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interval band around the regression line, as shown in Fig. 8. In this 
case, the 90-percent confidence region is fairly wide because of the 
relatively large standard error of this equation. The formula for the 
prediction interval is such that the width of the interval is sensitive 
to the size of the standard error; large standard errors indicate that 
much of the cost variation in the observed data is unexplained by the 


equation. 
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Fig. 8--The 90-percent prediction interval band 
for estimated costs based on sample data 


The prediction interval becomes wider as values of x that are far- 
ther from the mean of the sample are selected. From Eq. (4), the pre- 
diction interval (multiplied by 1000) for the mean 79.2 1b is $9,051 
+ $11,329; for x = 200 1b, the prediction interval is $19,077 + $14,794. 
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In the latter case, the width of the interval is about 1.3 times the 
width for the mean weight. This change in the size of the prediction 
interval occurs because the formulas are derived to allow for the pos- 
sibility that the estimated values of a and b differ from the true val- 
ues of a and b. Such a situation can occur when the sample data con- 
tain chance fluctuations that prevent the data from reflecting the true 
relationship that exists in the total population or when there are not 
sufficient data in the sample. 

Figure 9 illustrates the way in which errors in the estimates of 
@ and b affect the accuracy of estimates. The solid line represents 
the true relation between x and y. The dashed line represents an equa- 


tion in which the estimated values of @ and b differ from the true 
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Fig. 9--Effects of estimating errors on accuracy 
of predictions 


DEVELOPMENT OF ESTIMATING RELATIONSHIPS aD 


values. The figure shows that the effect of these errors increases 
with movement toward the extreme ranges of x. 

The width of the prediction interval is also sensitive to the 
level of confidence that is specified and to the number of degrees of 
freedom. That level was set at 90 percent (i.e., c/2 = 0.05). Suppose 
that only a 70-percent level of confidence is required (e/2 = 0.15). 
The only change in the inputs used in the previous calculations is the 
- 1.86; with 
a 70-percent level, ft = 1.11. This change will make a difference 


sale 
in the width of the prediction interval. Since the level of confidence 


value of t. With a 90-percent level of confidence, t 0 


is lower, the prediction interval is narrower; for lower levels of con- 
fidence, the band will be even more narrow. For e¢ = .10 and the degrees 
of freedom = 8, the value of 2 is 1.86. If the degrees of freedom 
were 16, ter would be 1.746. Thus, if there are twice as many degrees 
of freedom for an equation with the same standard error, the prediction 
interval for ¢« = .10 is smaller. However, the difference in prediction 
interval size because of differences in degrees of freedom is more sig- 
nificant for small samples than for large samples; the value of t for 
any given level of significance becomes almost constant for degrees of 
freedom over 30. For example, the smallest value of te/2 for ¢ = 10 
is 1.645. 

Before concluding this section, there are two additional points 
to be made. First, even when the coefficient of determination pr? is 
high, it is possible for the standard error of estimate to be large. 
This is explained by the fact that r* is based on a proportion and the 


standard error is based on an absolute quantity: 


a =: Explained variance 


Total variance ” 


SE YUnexplained variance. 


Thus, even if the explained variance represents a high fraction of the 
total variance, it is possible for the unexplained variance to be large 
relative to the estimated cost. This outcome would be indicated by the 


coefficient of variation. 
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Second, the statistical significance of regression relationships 
does not necessarily imply existence of a causal relationship. The 
following excerpt from an Institute of Defense Analyses (IDA) memoran- 


* 
dum illustrates the importance of this distinction in cost analysis: 


Frequently during cost effectiveness studies, the dis-— 
tinction between a "causation" cost model and a "correlation" 
cost model is overlooked. A simple example will be used to 
illustrate the distinction between the two types of cost 
models and show how a sensitivity analysis performed with a 
correlation cost model, rather than a causation model, can 
lead to erroneous conclusions. 

Example: Estimate the cost of assembling a piece of 
hardware. The assembly consists merely of bolting various 
elements together. The overwhelming majority of the cost 
of the assembly process is the salary paid to the men who 
do the bolting. Careful analysis of all the available cost 
data might yield a correlation cost model given by Equation 
i 


C=axw (1) 


where w is the total weight of all the bolts that go into 
the assembly, 
C is the cost of the assembly, 
ais a regression coefficient. 
By all of the various statistical measures of goodness 
of fit, Model 1 is a valid prediction equation. 
The causation cost model is given by Equation 2. 


C=kxhxn (2) 


where k is the hourly wages of the assemblers, 
h is the number of hours it takes to fasten and bolt, 
n is the number of bolts used in the final assembly, 
C is the cost of the assembly. 
It should be noted that the correlation cost model and 
the causation cost model are interrelated by Equation 3. 


w=Bx N (3) 


“Morris Zusman, "Use of Cost Models in Sensitivity Analysis and 
as a Design Aid," Institute of Defense Analyses, N-587(R), September 
1968. In this discussion, the term correlation is used figuratively 
in the sense that it is statistically significant in explaining the 
amount of variance rather than in the sense that both the dependent 
and independent variables are random. 
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where B is the weight of a single bolt, 

w is the total weight of all of the bolts that go into 

the assembly. 

Thus any design or sensitivity analysis performed on 
Equation 1, the correlation cost model, will lead to the 
correct results if Equation 3 is not violated. For example, 
an analyst would be correct in predicting that a cost reduc- 
tion would occur if he reduced the weight of the fasteners 
used by using less fasteners. He would be incorrect if he 
predicted a cost reduction would occur if he reduced the 
weight of the fasteners by substituting aluminum for steel 
bolts while keeping the number of bolts constant. The rea- 
son that a substitution of aluminum for steel bolts would 
not reduce the cost, is because the underlying relationship 
between the number of bolts and the weight of the fasteners 
(Equation 3), which is the reason for the good cost weight 
relationship of the correlation model, has been violated. 

In mathematical terms both a causation and a correla- 
tion cost model have the following properties. 


Cost = f (characteristics) (4) 


But only a causation model can be manipulated as Equa- 
tion 5, 


Characteristics = ee (cost) (5) 


The problem of determining whether a cost model is a 
correlation or a causation model is, except for the trivi- 
ally simple type of problem illustrated here, very difficult 
since all causation models can be transformed into correla- 
tion models. There exist no statistical tests to determine 
whether a model is a causation model or a correlation model. 

The types of explanatory variables used in the cost 
model generally will give a good guide as to whether a model 
is a correlation model or a causation model. For example, 
weight as an explanatory variable in a cost model where the 
material cost did not dominate, would be a good indication 
that the cost model was a correlation model. 

If the model is a correlation model and the analyst per- 
forms a sensitivity analysis, he runs the risk of violating 
the unknown underlying relationships between the correlation 
and causation models. If these underlying relationships are 
violated the sensitivity analysis will be erroneous. 


This example illustrates that regression analysis is an aid to, and not 


a substitute for, experience and understanding. 
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Curvilinear Analysis 


Until this point the analysis has been confined to a simple (one 
explanatory variable) linear regression. Although a cursory examina- 
tion of the scatter diagram of cost versus weight illustrated in Fig. 
1 indicates that a linear relationship may be adequate, it cannot be 
concluded definitely that a curvilinear relationship might not be 
preferable, These relationships can be examined by transforming the 
data to permit the relationships to be estimated using linear esti- 
mating techniques. The equation 


y =a + be (18) 


can be estimated using the least-squares method by substituting for 
each x and solving the normal equations as before. 

Another type of nonlinear relationship that is frequently used and 
that will be examined in discussing cost-quantity relationships in Sec. 


V is of the form 
y=ar. (19) 


For this form, a logarithmic transformation of both variables is made 
to obtain an equation that is linear in the logarithms of the original 


variables: 
log y = log a + b(log zx). (20) 


The regression analysis is then conducted in terms of the logarithms 
of the variables rather than in terms of the variables themselves. 


* 
(Throughout this section, logarithms to the base 10 will be used. ) 


It is possible to estimate relationships such as those represented 
by Eq. (19) directly. For example, see C. A. Graver and H. E. Boren, 
Jr., Multtvartate Logarithmic and Exponential Regression Models, The 
Rand Corporation, RM-4879-PR, July 1967. Although direct nonlinear es- 
timating techniques have some desirable properties, they are much less 
widely used in cost analysis than the linear methods. 
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However, to permit the standard techniques of statistical inference 
based on linear least-squares regression to be used, it is assumed that 
the dependent variable log y; is linearly related to the independent 
variable log z and to the normally distributed random variable u, by 


the equation 
log y= log a + b log x, + us Dies a's a5 0) (21) 


When antilogarithms are used, Eq. (21) is implicitly of the form 


y= axPio *, (22) 

Because of this difference in form, statistics derived for Eq. (22) are 
not directly comparable with those derived for Eq. (12). Similarly, 
statistics on predictions made by the two models will not be easily 
comparable because in the one case error is additive and in the loga- 
rithmic case error is exponential and multiplicative. 

The first step in estimating the coefficients for Eq. (20) is to 
convert to logarithms the data for cost (in thousands of dollars) and 
for weight shown in Table 1. The next step is to calculate the least- 


Squares estimates of b and log a. The results of these calculations are 


log y = -1.0425 + 1.0241(log zx), 
2 


rp = »560;, 

SE oe = .2763, (23) 
ty = 3.19, 
df = 8. 


The antilogarithms of both sides of Eq. (23) give 


1.0241 


y = (.09067)x (24) 


where y = cost in thousands of dollars, 


8 
Nt 


weight in pounds. 
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Based on the coefficient of determination ny and the calculated 
t-value (t,)> these results appear to be slightly better than those 
obtained with the linear case. However, care must be exercised in com- 
paring the logarithmic with the linear form and in evaluating the log- 
arithmic form itself. There are significant differences between the 
two forms. A hint of these differences is given by the fact that the 


standard error for the logarithmic case (SE ag? is the standard error 


1 
of the logarithms of the original numbers and not the standard error 
of the numbers themselves. For this reason, the standard error for 
the logarithmic case OF i = .2763) is about 20 times smaller than the 
standard error for the arithmetic or linear case (SE = 5.808). Thus, 
the relative sizes of these standard errors do not give a direct in- 
dication of the equation that has the smaller standard error in terms 
of the original numbers, which are the numbers of interest in cost 
analysis. 

A review of the manner in which least-squares estimators are cal- 


culated will help to clarify this difference and to explain how these 


results can be compared. The technique is to find q@ and b such that 
n 

ere 
t Gs ~ ¥y) (25) 


is minimized. In the logarithms of the numbers, however, this is equiv- 


alent to finding the minimum value of 


2 


a Y; 
) | log -¥| : (26) 
t=1 Ri 


L 


since (log ¥, - log y,) = log y,/¥ 5+ Thus, by transforming the vari- 
ables to logarithms, the sum of the squares of the logarithms of the 
ratios rather than the sum of the squares of the differences between 
the observed and actual values of y are minimized. 

The full impact of this change can be best illustrated by an exam- 


ination of the way in which the difference affects the calculation of 


DEVELOPMENT OF ESTIMATING RELATIONSHIPS 61 


prediction intervals. To obtain prediction intervals for cost estimates 
when a logarithmic equation is used, the intervals are first calculated 
directly with the logarithmic data and they are then converted to nat- 
ural numbers. Thus, the end points of the interval in logarithmic form 
are 


log y - Anj2 and log y+A (27) 


ayo 


where 






n+1 “ Clog x - log =" 
- z= (log a, — log z)? 





For the case where x = x, these end points become 


log y - (.2763)t_,,(1.049) and log y¥ + (.2763)t (1.049). (28) 


ere e/2 


When antilogarithms of these numbers are used, the following prediction 


interval end points for the e« level of significance are obtained: 


63107 OB yy ana’ 10"? 9". /o, (29) 
which are equivalent to 
mar and eae c/ 2 (30) 
10° e/2 


These results show that the prediction interval band for the original 
numbers, based on a logarithmic regression analysis, is both nonsym- 
metrical and proportional to the predicted values. Further, the stand- 


ard error for the logarithmic case (SE ae is more comparable with the 


lo 
coefficient of variation (CV) for the arithmetic case than it is with 
the standard error (SE) for the arithmetic case, because the standard 
error for the logarithmic case (like the coefficient of variation for 


the linear case) is a proportion. 
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The band for the standard error is delineated by the following 


locus of points for the various values of y: 


— ee at Gp? (31) 
10° 


Thus, the upper and lower bounds of the standard error band at the sam- 
ple mean value of y (9.06) based on the logarithmic regression analysis 


is given by the following numbers: 


9 .06 


10 2763 


a “tao, 


which equals 4.80 and 17.12, respectively. When these numbers are ex- 
pressed as differences around the mean, 8.06 is obtained for the upper 
half of the interval and 4.26 for the lower half. 

Figure 10 shows a graph of the values of the standard error for 
other values of y and the band for the 90-percent prediction intervals 
plotted above and below the regression line. These bands about the 
regression line illustrate both the nonsymmetry and the proportionality 
of these measures for the logarithmic case: nonsymmetry in that the 
distance between the regression line and the upper bounds is greater 
than that for the lower bounds; and proportionality in that the bounds 
become wider as y becomes larger. Because the standard error for the 
logarithmic case is a constant percentage of y, the absolute value of 
the bounds change as the value of y changes. 

In Fig. 11, an interval of plus and minus $5,808 (the amount of 
the standard error in the arithmetic case) and the standard error as 
shown in Fig. 10 have been plotted about the regression line that was 
obtained with the logarithmic transformation. Figure 11 illustrates 
the way in which the standard error based on the logarithmic regression 
analysis compares with the results that were obtained from the arith- 
metic equation. The interval of plus $5,808 intersects the upper bound 


of the standard error at the point where x = 65 1b. The interval of 


Cost ($ thousands) 
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Fig. 10--Logarithmic equation with standard error and 90-percent 
prediction intervals 


Cost ($ thousands) 
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minus $5,808 intersects the lower bound at x = 121 1b. Thus, for all 
estimated values of y greater than $12,300, the interval based on the 
value of the standard error of the arithmetic case is less than the 
lower bound of the standard error calculated from the logarithmic 
analysis. Similarly, for all estimated values of y greater than $6,500, 
this interval is less than the upper bound (logarithmic case). 

On the basis of these considerations, it can be seen that the com- 
parisons of the logarithmic results and the arithmetic results are dif- 
ficult and can often be misleading. Higher coefficients of determina- 
tion for the logarithmic case do not necessarily imply that this case 
is better from the viewpoint of explaining cost variance in the orig- 
inal numbers. Comparisons of the standard errors for these two cases 
is usually not possible without a full examination of the differences 
as illustrated in Figs. 10 and 11. 


Seer Pee HHP TPE ETH EH a 
eat Ef tegvenpion Tine: i 


es 
cL t aaa 


Loan Stet Soe 


bet errr te Eee EHH Hee 





0 20 40 60 80 100 120 140 160 180 = =8=200 240 


Weight (1b) 


Fig. 11--Compartson of standard error for logarithmic equation 
with interval based on standard error from arithmetic equation 
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However, on the positive side, some relationships are, in fact, 
nonlinear, and logarithmic transformations provide a practical means 
for estimating nonlinear exponential relationships with linear estimat- 
ing techniques. Although there are techniques for estimating exponen- 
tial forms directly with nonlinear estimating techatquess* there are 
also some difficulties in comparing and evaluating these results. Be- 
cause the direct estimating techniques for exponential forms are non- 
linear, they do not possess all the properties that are required to 
permit the direct application of standard regression analysis, 

Another useful application of logarithmic regression analysis 
arises in cases in which empirical evidence or experience indicates 
that the assumption of proportional variance, rather than constant var- 
iance, seems more appropriate. Frequently, a simple scatter diagram 
such as that shown in Fig. 6 is sufficient to indicate whether propor- 
tional or constant variance is more appropriate. Alternatively, the 
sample could be divided into two or more groups, and tests could be 
performed on the means of the absolute values of the residuals in the 
linear case in each group. If the higher values of the dependent vari- 
ables have residuals that are greater in value, the assumption of pro- 
portional variance would be indicated. The use of a logarithmic trans- 
formation is a convenient way to transform the data to conform to the 
requirement of proportional variance. If constant variance is assumed 
in the logarithms of the numbers, standard regression analysis can be 
performed in the logarithms. However, the assumption of constant var- 
iance in the logarithms implies proportional variance in the original 


numbers. 


Multiple Regression Analysis 


To this point, simple (one explanatory variable) regression anal- 
ysis has been used to examine both the linear and the nonlinear rela- 


tionship between cost and weight. With the array of data shown in 


See, for example, Graver and Boren. 
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Table 1 and the logarithmic transformations of these data, multiple 
(more than one explanatory variable) regression analysis will now be 
examined. This section covers the multiple linear and the multiple 
nonlinear (exponential) case; for the latter, logarithmic transforma- 
tions will be used. Because the sample documented in Table 1 contains 
only ten observations, the examination will be limited to various com- 
binations of two rather than three explanatory variables. If additional 
observations were included in the sample, three explanatory variables 
might be considered under certain circumstances; however, this number 

of variables used with ten observations would detract from the credi- 
bility of the results. In any event, there is no great loss in limit- 
ing the number of variables to two; the essential differences between 
simple and multiple regression can be illustrated with the two-explanatory 


variable case. 


In the linear case, the estimating equation is of general form 
y =a + be + ca. (32) 


The results for each of the possible combinations of two frum the set 


of three explanatory variables are as follows: 


Q 
Il 


-3.752 + .104(W) + .018(F), ess 
(2.61) (1.72) " 


Q 
U1 


2.930 + .074(W) + .0047(P), 
(1.12) (0.19) (33b) 


C = -0.526 + .045(P) + .027(F), 
(2.82) (2.38) (33e) 


where C = cost in thousands of dollars, 
W 
F 


weight in pounds, 


frequency in megahertz, 
P = power in watts. 
The number in parentheses below each of the estimated coefficients 


is the value of the t-ratios for each of these coefficients. However, 
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since an additional variable has been added, the degrees of freedom 
for these equations is 7 rather than 8, as it was for the simple case. 
Thus, the appropriate value of t in testing the null hypothesis for 
each of the coefficients is 1.895 rather than 1.860. 

To understand the use of t-ratios in multiple regression equations, 
the meaning of the multiple regression coefficients must be understood. 
In each case, the multiple regression coefficients shows the net effect 
of an explanatory variable. For example, Eq. (33a) can be interpreted 
as follows: For a given frequency, a 1-lb increase in weight will cause 
a $104 increase in cost. Alternatively, for a given weight, a 1-MHz 
change will cause the expected cost to change by $18. As the independ- 
ence between the explanatory variables decreases, the validity of this 
interpretation and the use of multiple variables diminish. For example, 
if weight and frequency are related in such a way that a change in 
weight cannot be assumed with frequency constant, the use of both var- 
lables in a single multiple regression equation can produce spurious 
results (e.g., the wrong sign on a coefficient, such as a negative sign 
for the weight coefficient). 

Fortunately, there are quantitative indicators that are useful in 
evaluating empirically the significance of such interdependencies on 
regression results. Allowance for interdependence is built into the 
formula for calculating the standard error of each coefficient in mul- 
tiple regression equations. Thus, the t-ratios in a multiple regres- 
sion not only serve to indicate the significance (or nonsignificance) 
of each of the explanatory variables but also indicate when there is 
an unacceptably strong relationship between these variables. 

From Eq. (33b), it can be seen that the inclusion of power with 
weight causes weight to become nonsignificant at the 10-percent level 
of significance. Weight was, however, significant at this level in 
the simple regression case. The coefficient of determination between 
weight and power is .533, which indicates that over 50 percent of the 
total variance in weight could be explained by a regression of weight 
on power. Thus, the adverse effect on the significance of weight that 
results from the inclusion of power can be attributed to the existence 


of interdependence between these two variables. 
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As the degree of interdependence increases, regression results be- 
come less stable and more indeterminant. As a consequence, the t-ratio 
should not be the sole test for assessing the amount of interdependence 
present. Further, it is not possible to give a precise cutoff point 
at which explanatory variables must always be considered too inter- 
dependent. A coefficient of .9 or more will almost certainly cause 
problems; one of .3 or less usually will not. The array of correla- 
tions and coefficients of determination among the explanatory variables 
should always be examined in the early stages of analysis, and, to the 
extent possible, the use of interdependent explanatory variables should 
be avoided. 

It is also possible for variables to be nonsignificant in multiple 
regression equations, even when there is no high level of interdepend- 
ence. For example, in Eq. (33a) the coefficient of frequency is non- 
significant at the 10-percent level although the coefficient of deter- 
mination between frequency and weight is only .091. Frequency in 
conjunction with weight is simply not a useful explanatory variable. 
Regardless of the reason, nonsignificant variables should not ordinar- 
ily be retained in regression equations used for cost estimating. Only 
one of the three multiple regression equations shown above produces an 
acceptable result: This is Eq. (33c), in which frequency and power are 
used as explanatory variables, and both are statistically significant. 

The question arises, For cost-estimating purposes, is the multiple 
regression with power and frequency preferable to the simple regression 
with weight as the explanatory variable? To find an answer, the other 
measures by which the regression equations are judged must be compared: 
the standard error of estimate, the coefficient of variation, and the 
coefficient of determination. These are shown in Table 4 for each of 


the multiple regressions for comparison with the results obtained from 


* 

In the limiting case of the two explanatory variable regressions 
in which one variable is an exact linear function of another, the re- 
gression results become completely indeterminant since the attempt is 
then to fit a plane in two dimensions, and there are an infinite num- 
ber of planes intersecting each line in the two-dimensional space. 

An excellent discussion of this point is found in John Johnson, Econ- 
ometric Methods, McGraw-Hill Book Company, Inc., New York, 1963, pp. 
201-207. 
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the simple regression. The primary concern in this comparison is be- 
tween the multiple regression with frequency and power and the simple 
regression with weight, since the power and frequency equation is the 
only one in which both the explanatory variables are significant. For 
completeness, however, the results for all three of the linear multiple 


regressions are shown and will be discussed. 


Table 4 


COMPARISON OF MULTIPLE-LINEAR WITH SIMPLE-LINEAR 
REGRESSION RESULTS 


Explanatory Variables 
Wetght Weight Frequency 


Stattstteal and and and 
Measures Weight Frequency Power Power 
Standard error 5.808 5.204 6. 192 4.999 
Coefficient of 
variation 0.641 0.574 0.683 0.552 
Coefficient of 
determination 0.325 0.526 0.329 0.563 
Degrees of freedom 8 7 - 7 


Equation (33a), in which weight and frequency are used, appears to 
give slightly better results in a comparison with the other measures. 
However, the coefficient of the frequency variable is not significant 
at the 10-percent level. As a consequence, the improvement is not a 
statistically significant one. The generalized test to determine 
whether the incremental improvement associated with the addition of a 
variable is significant uses an P-statistic.” The test performed with 
this statistic is similar to the t-test. In this case, the null 
hypothesis is that the increment is not significant. The statistic 


used to test this null hypothesis is 


re Increment of explained variance + degrees of freedom 


Remaining unexplained variance + degrees of freedom ~ 


* 

See F. E. Croxton, D. J. Cowden, and S. Klein, Applted General 
Statistics, 3d ed., Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 
1960, p. 627. 
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This can be rewritten as 


_ Gites 2D. 


PF 
(.=.8)/7 


(34) 


where Rr = the coefficient of determination of the equation that in- 
cludes weight and frequency, 
r = the coefficient of determination of the equation with weight 
alone. 
Equation (34) shows only 1 degree of freedom involved in the numerator, 
which is the incremental degree of freedom lost by adding another co- 
efficient. The degrees of freedom in the denominator equal the number 
of observations in the sample less the number of coefficients estimated. 
Substituting the appropriate coefficients of determination in the 


formula for the F-statistic, we obtain 


(.526 - .325) _ (.201)(7) _ 5 97. (35) 


Y= - .5mi7 47h 


This value falls short of the critical value of F, which equals 3.95 at 
the 10-percent level of significance. Thus, the null hypothesis is ac- 
cepted, and we conclude that the net increment in explained variance 
associated with the addition of frequency to the equation containing 
weight is insufficient to establish that the improvement is not due to 
chance. 

In Eq. (33b), in which weight and power are used as explanatory var- 
iables, it can be seen that the loss of the degree of freedom associa- 
ted with adding another variable more than offsets the slight increase 
in the proportion of explained variance (R*). As a result, the stand- 
ard error in this case is greater than it is for the case where weight 
is used alone (6.192 versus 5.808). Thus, not only are the variables 
not significant, but the equation would also produce slightly less 
satisfactory (larger) prediction intervals than simple regression, al- 
though the coefficient of determination is slightly larger. 

Equation (33c), in which power and frequency are used as explana- 


tory variables, compares favorably with the simple regression in which 
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weight is used, and thus far appears to be the best estimating equation 
derived. However, to complete the analysis, the nonlinear equations 
should be examined. These equations, expressed in the logarithms of 


the original numbers, have the general form 


log y = log a + b(log x) + e(log 2). (36) 


The results for each of the possible different combinations of two that 


can be developed from the set of three explanatory variables are as 


follows: 
log C = -1.8576 + 1.1385(log W) + .2743(log F), (37a) 
(3.78) (1.62) 
log C = -0.6582 + .7145(log W) + .1542(log P), (37b) 
(1.46) (.842) 
log C = -1.1933 + .5756(log P) + .6085(log F), (37c) 
(8.44) (5.91) 
where C = cost in thousands of dollars, 
W = weight in pounds, 
F = frequency in megahertz, 
P = power in watts. 


The other measures required to complete the comparisons between the 
various equations are shown in Table 5. 

The major patterns in the nonlinear multiple regression equations 
compared with the nonlinear simple case are similar to those for the 
linear equations. The use of both frequency and weight produces 
slightly better results, but the coefficient of the frequency variable 
is not statistically significant at the 10-percent level. The use of 
power with weight again produces a larger standard error than the sim- 
ple case although the coefficient of determination is slightly larger. 
In all respects, the best nonlinear equation is the equation that uses 
power and frequency as explanatory variables. In addition, this non- 


linear equation has a significantly larger coefficient of determination 
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than the best linear equation. The best linear equation also uses power 
and frequency and has a coefficient of determination of .563. The non- 


linear form has a coefficient of determination of .913. 


Table 5 


COMPARISON OF MULTIPLE-NONLINEAR WITH SIMPLE-NONLINEAR 
REGRESSION RESULTS 


Explanatory Variables 
Log Weight Log Wetght Log Frequency 


Statistical Log and and and 

Measures Wetght Log Frequency Log Power Log Power 
Standard error 0.2763 0.2518 0.2814 0.1312 
Coefficient of 

determination 0.560 0.680 0.600 On9TS 
Degrees of freedom 8 f i 7: 


The remaining question is whether the nonlinear results are suffi- 
ciently superior to the linear results to conclude that the nonlinear 
equation should be used in preference to the linear one. The standard 
error for each in the original numbers at the mean and as a percentage 
of the mean should be compared. If the results show that the standard 
error for the nonlinear case is smaller, this evidence, and the fact 
that the coefficient of determination for the nonlinear case is much 
larger, can be used as a basis to judge in favor of the nonlinear form. 

When the formulas shown in Eq. (31) are used, the end points that 


delineate the standard error at the mean for the nonlinear equation are 


9.060 


19° 1312 


and (9,060)10° ! 342 


When the end points are simplified, the following values are obtained: 
6.698 and De 2 De 


These results, expressed as differences from the mean, give values of 


2.362 below and 3.195 above the mean. Thus, the lower band of the 
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standard error for the nonlinear case is 26 percent of the mean and the 
upper bound is 35 percent. This compares favorably with the coefficient 
of variation from the linear case, which is about 55 percent. Thus, 
given the inherent limitations of the small sample size of 10, the use 
of the nonlinear form improves the results significantly. The preferred 


equation is 


log C = -1.1933 + .5756(log P) + .6085(log F), (38) 
or 
C= (,0641)P*? 7967-6085 | 
where C = cost in thousands of dollars, 


e 


power in watts, 
F = frequency in megahertz, 
log = logarithm base 10. 
This equation is also acceptable on logical grounds since the estimated 


relationships between cost and power and cost and frequency are positive. 


Documentation 


Once an estimating relationship has been developed, a report that 
documents the data, assumptions, and analytical results is indispens- 


able. The following guidelines for preparing the report are suggested: 


1. Describe the scope and coverage of the study and of the equa- 
tions that have been developed. 

2. Assuming that the study has provided for a survey of work 
already performed in the area of interest (a desirable part 
of any cost-research study), prepare a summary of the survey 
results. 

3. Describe the major input data used in the study. The raw and 
adjusted data, which includes data for both the dependent and 


explanatory (independent) variables, should be documented to 
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the extent that is feasible. Include data not only for those 
cost categories and characteristics used in the final estimat- 
ing equations, but also for those characteristics that were 
considered but were eliminated in the process of analysis. 
Describe and explain fully any adjustments to the raw data; 
indicate limitations and accuracy. Because one of the outputs 
of a cost-research study is the data base itself, documentation 
should be such that the data base will be useful in future 
studies. 

Identify sources and dates of the data. 

Define each dependent and explanatory variable considered in 
the study. (Unambiguous definitions of weapon system charac- 
teristics and cost elements are usually more involved than 
appears at first glance.) 

Provide the major dependent- versus single-explanatory-variable 
scatter diagrams used in the study. The diagrams should be 
labeled to identify each data point. 

Document the final equations as well as the other major equa- 
tion forms examined in the study; include such statistics as 
the standard error of estimate, coefficient of determination, 
coefficient of variation, and prediction intervals to the 
extent that they are derived for each equation. Other criteria 
that are considered appropriate for indicating the goodness of 
fit and prediction capabilities of the equations should be 
described. 

For the major final equations, ‘prepare a table such as Table 

6 to show the observed values of the dependent variables, the 
estimated values, the deviations, and the percent deviation 
from the observed values. In addition, prepare a scatter 
diagram, such as that illustrated in Fig. 12, on which the 
observed values versus the estimated values are plotted. The 
points on the diagram should be labeled to identify each item. 
(Figure 12 shows that the apparent problem of stratification 
illustrated in Fig. 4 has been eliminated by including fre- 


quency as an explanatory variable.) 
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9. Describe the alternative equations that were considered and 
why they were rejected. The report should convey a sense of 
the improvement that results from a high degree of selectivity 
in choosing the final forms. The alternative equations could 
show 
a. The use of different explanatory variables; 

b. Different forms of the equations, e.g., linear, multi- 
plicative (linear in the logarithms), or other nonlinear 
forms; 

c. The use of different forms of the dependent variables, 
e.g., cost per pound or cost per item; 

d. The use of stratified dependent variables grouped into 
subcategories that are determined by such factors as ship 
or missile type, weight, frequency, or speed regime. 

10. Describe any special methodology in an appendix if only of 
special interest (e.g., a sophisticated mathematical approach). 

11. Describe the cost-estimating methods fully and clearly. It 
should be possible to reconstruct the results of the study 
from the data base as it is given in the report. The major 
assumptions, statistical and otherwise, used in the deriva- 


tion of the equations should be explicitly stated. 


Table 6 


ACTUAL AND ESTIMATED COSTS OF AIRBORNE COMMUNICATION EQUIPMENT 


Devtatton 
Actual Estimated (Actual less 
Cost Cost estimate) Percent 
($) ($) ($8) Deviation 
22,200 13,768 +8 , 432 +38 
17,300 16 ,970 +1,330 +8 
11,800 17,388 -5,588 -47 
9,600 9,238 +362 +4 
8,800 9,238 -438 <a 
7,600 6,435 +1,165 +15 
6,800 6,885 -85 -1 
3,200 4,581 -1,381 -43 
1,700 2,062 -362 -21 
1,600 1.261 +339 +21 


Average of absolute value of percent deviations = 20 


Actual cost ($ thousands) 
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Estimated cost ($ thousands) 


Fig. 12--Actual cost versus estimated cost 


Provide an example to illustrate the procedure for using the 
final cost-estimating relationship, 

Describe the limitations of the final equations as specifi- 
cally as possible. State the range of characteristics over 
which the estimating procedure applies and any other restric- 


tions on the population covered by the equations. 
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IV. USE OF COST-ESTIMATING RELATIONSHIPS 


THE WIDESPREAD USE of estimating relationships in the form of simple 
cost factors, equations, curves, nomograms, and rules of thumb attests 
to their value and to the variety of situations in which they can be 
helpful. But an estimating relationship can only be derived from in- 
formation on past occurrences, and the past is not always a reliable 
guide to the future. As all horseplayers know, the favorite runs out 
of the money often enough to prove that an estimate based on past per- 
formance is very likely to be wrong. Admittedly, there may be other 
factors at work in a horserace, but the problem remains the same as 
that encountered in any attempt to predict the course of future events, 
i.e., how much confidence can be put in the prediction? This question 
dominates all other considerations in any discussion of the use of esti- 
mating relationships. 

These remarks are not intended to depreciate the value of estimat- 
ing relationships. They are an important tool in an estimator's kit 
and, in many cases, the only tool. Thus, it is essential that their 
limitations be understood to preclude their improper use. The limita- 
tions of estimating relationships stem from two sources: first, the 
uncertainty inherent in any application of statistics and second, the 
uncertainty that an estimating relationship is applicable to a partic- 
ular article. The first pertains primarily to articles well within 
the bounds of the sample on which the relationship is based; even here, 
uncertainty may be found. The second source refers to those cases in 
which the article has characteristics somewhat different from those of 


the sample. Although extrapolation beyond the sample is universally 


Ue) 
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deplored by statisticians, it is universally practiced by cost analysts 
in dealing with advanced hardware because, in most instances, it is 
precisely those systems outside the range of the sample that are of 
interest. The question is whether the equation is relevant to the case 
under investigation, although good statistical practice would question 


the validity of such an approach. 


Characteristics of the Estimating Relationship 


The degree of emphasis placed on statistical treatment of data can 
cause two fundamental points to be overlooked: first, that an estimat- 
ing relationship must be reasonable and second, that it must have pre- 
dictive value. 

Reasonableness can be tested in various ways--by inspection, by 
simple plots, and by complicated techniques that involve an examination 
of each variable over a range of possible values. Inspection will often 
suffice to indicate that an estimating relationship is not structurally 
sound. For example, the following equation is the result of an exer- 
cise at the Air Force Institute of Technology in which students were 


asked to develop cost-estimating relationships for small missiles: 
C = 8347.5 + 150.6W - 1149.1R, (1) 


where C = cost of airframe + guidance and control, 

W = weight in pounds, 

R = range in miles. 
This equation fits the data very well, but it states that as range in- 
creases, the cost decreases; such an assumption appears to be in error. 
If cost is a function of range, the relationship should be direct 
rather than inverse. To investigate further, choose two hypothetical 
but reasonable values for W and R within the range of the sample data: 
38.5 - 157 lb for W, 5.0 - 14.8 mi for R. Table 1 shows that Missile 
B, although heavier and with greater range than Missile A, is estimated 
as the cheaper of the two, which is contrary to experience. A reexam- 


ination of the sample data and the equation is in order. 
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Table 1 


SAMPLE COST COMPARISON OF TWO MISSILES 


Atrframe Weight Estimated Atrframe Cost 
Hypothetical + Guidance and Control Range + Gutdance and Control 
Missile (Lb) (mt) ($) 
A 50 5 LO 5132 
B 75 10 Sy d52 


When an estimating relationship is developed to make a particular 
estimate, it may have little predictive value outside a narrow range. 
As an example, consider the following equation for estimating the cost 


of solid-propellant motors for small missiles: 
Cost = 1195.6 + .000003I°, (2) 


where J = total impulse. 


The equation fits the sample data very well: 


Missile Observed Cost Estimated Cost 
Motor ($) ($) 

A 2600 2660 

B 1700 1693 

G 1250 1265 

D 1750 1781 


If it were appropriate to use statistical measures for a sample of 4, 
Eq. (2) explains over 99 percent of the total variance. But, note that 
the constant 1195.6 accounts for 94 percent of the cost of Motor C and 
that the cost of all motors smaller than Motor C will be about $1200. 
Because of the rs term, the influence of total impulse is likely to be 
too pronounced for motors larger than those in the sample. 

A common method of examining the implications of an estimating re- 
lationship for values outside the range of the sample is to plot a scal- 
ing curve as shown in Fig. 1. Scaling curves may be plotted on either 
arithmetic or logarithmic graph paper as Fig. 1 illustrates; cost ana- 
lysts usually prefer the log-linear representation. The theory on which 
a scaling curve is based is as follows: As an item increases in weight 


(or another dimension), the incremental cost of each additional pound 


Cost per pound ($ thousands) 
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Cost per pound ($ thousands) 


0 200 400 600 800 1000 
Dry weight (1b) Dry weight (1b) 





Fig. 1--Sealing curve: cost per pound versus dry wetght 


(or square foot, watt, horsepower) will decrease or increase in a pre- 
dictable way. Thus, in Fig. 1 the cost per pound of an electrical 
power subsystem in a manned spacecraft decreases from about $4200 to 
$1400 as the total weight increases from 100 to 1000 1b. The slope of 
the curve is fairly steep; if the curve were extended to the right, it 
might be expected to flatten. Eventually, the curve might become com- 
pletely flat at the point at which no more economies of scale can be 
realized, but it is unlikely that the slope would ever become positive. 

Now examine Fig. 2 in which total impulse is plotted against cost 
per pound-second based on values obtained from an estimating relation- 
ship. Two differences are immediately seen. First, the lefthand por- 
tion of the curve is unusually steep. Second, the slope becomes posi- 
tive when total impulse exceeds about 22,000 lb-sec. In some instances, 
fabrication problems increase with the size of the object being fabri- 
cated and a positive slope may result. No such problems are encountered 
in the manufacture of small, solid-propellant rocket motors, however, 
and continued economies of scale are to be expected. 

Figure 2 illustrates another point: A more useful estimating re- 


lationship could have been obtained by drawing a trend line rather than 


Cost per pound-second ($) 
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Fig. 2--Cost per pound-second versus total impulse 


by fitting a curve to the four data points. With a small sample, it is 
often possible to write an equation that fits the data perfectly, but 
the equation is useless outside the range of the sample. Statistical 
manipulation of a sample this size rarely produces satisfactory results. 
A final example of the kind of error that undue reliance on sta- 
tistical measures of fit may bring about is based on an estimating 
equation for aircraft airframes. Initially, the equation for estimat- 
ing airframe production labor hours was based on a sample of 44 air- 
craft. It then seemed that a grouping of the aircraft by type should 
give better correlation and, in fact, when the bombers, fighters, 
trainers, and cargo aircraft were considered separately, the average 
deviation between estimates and actual values was markedly reduced. 
For example, in the case of trainer aircraft, the average deviation 
was reduced from 20 to 6 percent, and a more useful estimating rela- 
tionship was obtained. In the case of fighters, however, although 
average deviation was reduced from 15 to 11 percent, the estimating 


equation exhibited the flaw shown in Eq. (3): 


Manufacturing hours = 4.28 apes Cas” (3) 
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The exponent of weight is greater than 1.0, which means that when speed 
is held constant and weight increased, the man-hours per pound of air- 
frame weight will increase. This can be seen in Fig. 3. The dashed 
lines show scaling curves derived from the total sample of 44 aircraft. 
These portray the normal relationship--as weight increases, hours per 
pound decrease. The regression equation gives the opposite results 
because the general trend in fighter aircraft has been for increased 
speed to be accompanied by increased weight, which causes an emphasis 
on the weight variable. It cannot be assumed, however, that all new 
fighters will conform to this trend; the equation, if used at all, would 
have to be used with great care. 

The advice is frequently given that an estimating relationship 
should not be used mechanically. This implies (1) that the function 
must be thoroughly understood and (2) that the hardware involved must 
be understood as well. To illustrate the first point, examine an 


estimating relationship for direct manufacturing hours derived from a 
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Fig. 3--Comparison of regression lines with sealing curves 
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sample of Navy and Air Force airframes: 


Bn, * 1a Oe, 


100 (4) 


where A100 = manufacturing labor hours required to produce the 100th 
airframe, 

W = gross takeoff weight in pounds, 

S = maximum speed in knots. 
The multiple correlation coefficient is 0.98 and the coefficient of var-~ 
iation is .016 in logarithmic terms. Despite these satisfactory meas- 
ures of fit, a comparison of the actual manufacturing hours for each 
airframe in the sample with those estimated by the equation provides a 
better understanding of how the relationship relates to the real world. 
In such a comparison, as shown by Table 2, 33 percent of the éstimates 
differ from the actuals by more than 20 percent, and 7 percent differ 
by more than 30 percent. These figures imply that an analyst with only 
the estimating relationship on which to rely may or may not obtain a 
good estimate. However, if the less acceptable results can be explained 
in some way, the analyst is then in a much better position to understand 
the strengths and weaknesses of the equation. 

Since this estimating relationship is based on gross takeoff weight 

and maximum speed, an initial hypothesis to explain the variations might 
be that the estimates decrease in quality at one end of the weight or 


speed range or in certain combinations of weight and speed. In this 


Table 2 


COMPARISON OF ACTUAL AND ESTIMATED 
MANUFACTURING HOURS 


Difference Between 


Actual Hours and Number Pereentage 
Estimated Hours of of 
(%) Atrframes Sample 
10 or less 15 56 
11-20 3 i Ei 
21=30 7 26 


31-40 2 7 


Maximum speed 


86 EQUIPMENT COST ESTIMATING 


case, however, as shown in Fig. 4, the poorer estimates are scattered 
throughout the sample, which indicates no consistent bias because of 
the explanatory variables. 

A second hypothesis might be that the manufacturing history of the 
airframes in the sample explains the discrepancies and, in general, this 
hypothesis is valid. Of the nine airframes in the sample for which esti- 
mates differed from actuals by 20 percent or more, several were consid- 
ered problem airframes, i.e., airframes for which the manufacturer 
encountered an abnormal number of problems in meeting weight and per- 
formance specifications. Interestingly enough, these were not aircraft 
in which a major state-of-the-art advance was being attempted. Another 
cause for discrepancy was the interspersion of different models of the 
same aircraft in a single lot: For example, reconnaissance versions 


of a bomber were interspersed among bomber airframes. Situations of 
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Fig. 4--Plot of sample data 
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this kind increase direct labor requirements. The two airframes for 
which the estimates were the poorest and for which almost 40 percent 
less labor than the equation predicted was required, were vastly dif- 
ferent ones-~-a large transport and a supersonic fighter. Production 
of one of these airframes benefited from the manufacturer's concurrent 
experience with a commercial airplane of similar configuration. The 
other case cannot be explained. The amount of labor involved in pro- 
ducing the airplane was unusually low. 

Although it is not possible to resolve all uncertainties with the 
information available, an estimator can feel reasonably confident that 
the estimating relationship does not contain a systematic bias, that 
it should be applicable to normal production programs, and that it 


provides reasonable estimates throughout the breadth of the sample. 


Hardware Considerations 


The sample included aircraft having gross takeoff weights of 
6100 1b to 450,000 1b and maximum speeds of 300 kn to 1200 kn. Suppose 
that a proposed new aircraft has a gross weight of 600,000 lb and a 
maximum speed of 1700 kn. Should Eq.( 4) be used as the estimating 
equation in this case? The same question could arise for an aircraft 
with weight and speed that are in the sample range, but which is to be 
fabricated by a new process or out of a new material. Again, the esti- 
mator must decide whether the equation is relevant or how it can be 
modified to be useful. An estimating relationship can be used properly 
only by a person familiar with the type of equipment whose cost is to 
be estimated. To say that an analyst who estimates the cost of a de- 
stroyer should be familiar with the characteristics of destroyers is 
a truism; however, an estimator is sometimes far removed from the act- 
ual hardware. Further, he may be expected to provide costs for air-to- 
air missiles one week and for a new antiballistic missile system the 
next. The tendency in such a situation may be to use the equation that 
appears most appropriate without taking the required measures to deter- 


mine whether the equation is applicable. 
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To illustrate the problem, assume that a new supersonic bomber is 
proposed having a gross weight of 450,000 1b and a maximum speed of 
1700 kn. Equation (4) may be inappropriate because the speed is far be- 
yond the range of the sample. On the other hand, no equation exists for 
aircraft in that speed range, and an estimate is required. This situa- 
tion may be regarded as the normal one, and there is no choice but to 
use what is available. In this example, Eq. (4) gives 542,000 direct 
labor manufacturing hours. 

The next step is to compare the result with other similar systems 
to see if the estimate appears reasonable. In this instance manufac- 
turing hours versus gross weight are plotted for several other large 
aircraft as shown in Fig. 5. The supersonic bomber estimate SSB) is 
substantially above the trend as it should be, because a 1700-kn air- 
frame will be more difficult to build than a subsonic airframe of the 
same size. If other information is lacking, an estimator might accept 
the figure of 542,000 hr. In this case, however, all the airframes in 
the sample were fabricated almost entirely of aluminum; an airframe 
built to withstand the heat generated by sustained flight in the atmos- 
phere at a speed of about Mach 3 will require a metal such as stainless 


steel or titanium. The question that occurs is whether the speed vari- 


able in the equation fully accounts for this change in technology. 
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Fig. 5--Trend line for large aircraft 
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One way to answer this question is to plot a second scatter dia- 
gram, with speed as the independent variable. Figure 6 shows labor 
hours per pound of airframe weight plotted against speed with a calcu- 
lated line of best fit drawn through the scatter. If an airframe 
weight of 125,000 lb out of a gross weight of 450,000 lb is assumed, 
the estimate of 542,000 hr is equal to 4.3 hr-lb of airframe, which 
not only is below the calculated trend line, but is also below any rea- 


sonable trend line that can be drawn through the sample. (This point 


is shown as SSB, ie Pio 62) 


Labor hours per pound of airframe 
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Fig. 6--Labor hours per pound versus maximum speed 
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Three possible estimates can now be considered: 542,000 hr based 
on speed and weight; about 300,000 hr based on weight alone as shown 
by Fig. 5; and about 925,000 hr based on speed alone as shown by the 
regression line in Fig. 6 (7.4 hr-1lb x 125,000 1b = 925,000 hr). More 
information is needed to narrow the range. 

Although data are less than abundant, several experimental and pro- 
totype aircraft have been fabricated using stainless steel and titanium. 
On the basis of prototype experience, one manufacturer maintains that 
a titanium airframe requires twice the number of hours that an aluminum 
airframe requires; however, manufacturing hours for an aluminum air- 
frame can vary considerably. A second approach is more precise. An 
examination of actual data for different airframes with speeds of Mach 
3 and above shows that these airframes require about 1.5 times as many 
hours as the estimating relationship of Eq. (4) indicates, which implies 
813,000 hr or 6.5 hr-1lb for the supersonic bomber. (This point is shown 
as SSB, in Fig. 6.) On the basis of current knowledge, the estimate 
appears to be reasonable. Further measures could be taken in the form 
of another independent estimate that uses a different estimating rela- 
tionship. An estimator does not have this option for most kinds of 
hardware, because estimating relationships are not plentiful. However, 
in the case of airframes, a number of equations have been developed 
over the years; it is good practice to use one to confirm an estimate 


made with another. 


Judgment in Cost Estimating 


The need for judgment is often mentioned in connection with the 
use of estimating relationships. Although this need may be self-evident, 
one of the problems in the past has been too much reliance on judgment 
and too little on estimating relationships. The problem of introducing 
personal bias with judgment has been studied in other contexts, but the 
conclusions are relevant to this discussion. In brief, a person's occu- 
pation or position seems to influence his forecasts. Thus, a consistent 
tendency toward low estimates appears among those persons whose inter- 


ests are served by low estimates, e.g., proponents of a new weapon or 
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support system whether in industry or in government. Similarly, there 
are people in industry and in government whose interests are served by 
caution. As a consequence, their estimates are likely to run higher 
than would be the case were they free from all external pressures. (In 
fairness to this latter group, however, overestimates are rare enough 
to suggest that caution is not a quality to be despised.) 

The primary use of judgment should be to decide first, whether an 
estimating relationship can be used for an advanced system, and second, 
if so, what adjustments will be necessary to take into account the ef- 
fect of a technology that is not present in the sample. Judgment is 
also required to decide whether the results obtained from an estimating 
relationship are reasonable. This does not mean reasonable according 
to a preconception of what the cost ought to be, but reasonable in a 
comparison with the past cost of similar hardware. A typical test for 
reasonableness is to study a scattergram such as Fig. 7 of costs of 
analogous equipment at some standard production quantity. The estimate 
of the article may be outside the trend lines of the scattergram and 
still be correct, but an initial presumption exists that a discrepancy 
has been discovered and that this discrepancy must be investigated. An 
analyst who emerges from his deliberations with an estimate implying 
that new, higher performance equipment can be procured for less than 
the cost of existing hardware knows that his task is not finished. If, 
after research, he is convinced that the estimate is correct, he should 


then be prepared to explain the new development that is responsible for 
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Fig. 7--Cost comparison of analogous equipment 
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the decrease in cost. He should not raise the cost arbitrarily by a 
percentage to make the figure appear more acceptable or because he feels 
that the estimate is too low. (Such adjustments are the province of 
Management and are generally occasioned by reasons somewhat removed from 
those discussed here.) Judgments must be based on well-defined evidence. 
The only injunction to be observed is that any change in an estimate be 
fully documented to ensure that the estimate can be thoroughly under- 
stood, and to provide any information that may be needed to reexamine 


the equations in the light of the new data. 


V. THE LEARNING CURVE 


FOR MANY YEARS the aerospace industry has made use of what variously 
have been called "learning," "progress," "improvement," or "experience" 
curves to predict reductions in cost as the number of items produced 
increases. The learning process is a phenomenon that prevails in many 
industries; its existence has been verified by empirical data and con- 
trolled tests. Although there are several hypotheses on the exact man- 
ner in which the learning or cost reduction can occur, the basis of 
learning-curve theory is that each time the total quantity of items pro- 
duced doubles, the cost per item is reduced to a constant percentage of 
its previous cost. Alternative forms of the theory refer to the in- 
cremental (unit) cost of producing an item at a given quantity or to 
the average cost of producing all items up to a given quantity. For 
example, if the cost of producing the 200th unit of an item is 80 per- 
cent of the cost of producing the 100th item, and if the cost of the 
400th unit is 80 percent of the cost of the 200th, and so forth, the 
production process is said to follow an 80-percent unit learning curve. 
If the average cost of producing all 200 units is 80 percent of the 
average cost of producing the first 100 units, the process follows an 


* 
80-percent cwnulative average learning curve. 


*The quantities mentioned in connection with the learning concept 
presuppose the inclusion of all items. As concerns the J-79 engine 
used on the F-4 airplane, one would expect engine costs for the first 
100 F-4s to be more than that for the second 100 airplanes. Although 
this is true, what is important is that the J-79 has been used on sev- 
eral other types of aircraft, and these uses, including full spare 
engines, must be considered in learning-curve analysis. 


a3 
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Either formulation of the theory results in a power function that 
is linear on logarithmic grids. Figure 1 shows a unit curve for which 
the reduction in cost is 20 percent with each doubling of cumulative 
output, the upper figure showing the curve on arithmetic grids and the 
lower on logarithmic grids. The arithmetic plot illustrates that the 
percentage reduction in cost in each unit is very pronounced for the 
early units. On an 80-percent curve, for example, cost decreases to 
28 percent of the original value over the first 50 units. Over the 
next 50 units, it declines only 5 more percentage points, i.e., down to 
23 percent of unit 1 cost. The factors that account for the decline in 
unit cost as cumulative output increases are numerous and not completely 


understood. Those most commonly mentioned are 


1. Job familiarization by workmen, which results from the repeti- 
tion of manufacturing operations. 

2. General improvement in tool coordination, shop organization, 
and engineering liaison. 

3. Development of more efficiently produced subassemblies. 

4. Development of more efficient parts-supply systems. 

5. Development of more efficient tools. 

6. Substitution of cast or forged components for machined compo- 


nents. 


7. Improvement in overall management. 


The above list of relevant factors is not complete, and it tends to 
understate the importance of the item sometimes considered the most 
important--labor learning. Labor cost, however, cannot decline through 
experience gained by workmen unless management also becomes more effi-- 
cient. In other words, it is necessary for management to organize and 
coordinate more efficiently the work of all manufacturing departments 
so that parts and assemblies will flow smoothly through the plant. 
Labor cost is not the only element of manufacturing that declines 
as cumulative output increases. A learning curve exists for unit mate- 
rials cost. The materials category frequently includes much purchased 
equipment, which in turn includes a substantial number of engineering, 


tooling, and labor hours. Unit hours decline as production quantities 
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increase, and the contractor who buys in successive lots is generally 
able to negotiate a lower price for each lot. Decreases in raw mate- 
rial costs are generally attributed to two factors as cumulative out- 
put increases: The workmen learn to work the raw materials more effi- 
ciently, cutting down spoilage and reducing the rejection rate, and 
Management learns to order materials from suppliers in shapes and sizes 
that reduce the amount of scrap that must be shaved and cut from the 
pieces of sheet or bar to fabricate the item of equipment. Substitu- 
tion of forgings for machined parts also reduces the amount of scrap 
material. 

A second factor that is probably responsible to a lesser extent for 
the decline in materials cost is the pricing policy of the raw material 
suppliers. These suppliers generally reduce the price per pound for 
the various kinds of raw materials if an order is sufficiently large. 
Although the learning curve pertains to cost reductions as materials 
are applied to successive lots and not to reductions due to volume pur- 
chases, segregation of the two effects is imperfect. This may account 
for differences observed in learning-curve slopes. 

A third major component of cost--overhead--also declines with cumu- 
lative output, but as a result of the method of allocating overhead and 
not because of a perceptible relationship between overhead rate and 
cumulative output. Direct labor hours per unit decline as cumulative 
output increases, and overhead is distributed to each unit on the basis 
of direct labor cost or hours. As a consequence, it is inappropriate 


to discuss a learning curve for this element of cost. 


The Log-linear Hypothesis 


The relationship between cost and quantity may be represented by 


a power (log-linear) equation of the form 


y= a , 


where x equals the cumulative production quantity. The relationship 
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corresponds to a unit or a cumulative average learning curve according 
to whether y is the cost of the xth unit or the average cost of the 
first x units. The constant qa is the cost of the first unit produced. 
The exponent b, which measures the slope of the learning curve, bears 
a simple relationship to the constant percentage to which cost is re- 
duced as the quantity is doubled. If S represents the fraction to 


which cost decreases when quantity doubles, the equation becomes 


— 42a = a( 2x)? ol oP oo & = ee. 
Yn a log 2 


This equation shows that for a value of S equal to 75 percent, the cor- 


x 
responding value of db is 


= or -.415. 


Log-linear Unit Curve 


If a production process follows a unit learning curve of the form 


= ax? , the cumulative cost 7 of producing the first m units is 


4y 


n 
f-2 : #. 


x=1 


The cumulative average cost Us of producing the first m units is then 


The relationship between the unit curve and the cumulative average 
curve is shown by Fig. 2. The function Yo is not log-linear; however, 


as x becomes larger, Yo approaches asymptotically the value 


* 
In learning-curve literature, the term slope often refers to this 
percentage reduction; e.g., a 75-percent slope means a curve with a b 
value of -.415. 
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which differs from the expression for unit cost only by the constant 
factor 1/(b + 1). Consequently, if unit cost has been estimated at a 
sufficiently large quantity, the cumulative average cost for the same 
quantity may be approximated by multiplying the unit measure by 

I/(b + 1)." 


Log-linear Cumulative Average Curve 


When a production process follows a log-linear cumulative average 
curve rather than a unit curve, the basic functional form is still 
Ye = ee but can be written i, * an”. where Yn is the average cost of 
the first x units. The cumulative cost for producing x units is simply 


b+1 
YX» OY ax » and the unit cost is obtained from the function 


ae ite yer}. 


The relationship between a linear cumulative average curve and the re- 
sulting unit curve is illustrated in Fig. 3. The unit curve is not 
log-linear; however, as x becomes larger, Y, quickly approaches asymp- 


totically the value 


i ange’, 


which differs from the cumulative average cost equation only by the 
constant factor (b + 1). 
These equations may appear cumbersome, but in practice much of 


the work involved in using learning curves has been simplified by the 


 yaceics a quantity is sufficiently large for the asymptotic method 
to provide a good approximation depends on the slope of the learning 
curve. For a 90-petcent curve, the asymptotic method produces an error 
of about 1 percent at quantity 100; for a 75-percent curve, the error 
at quantity 100 is almost 5 percent and does not decrease to 1 percent 
until a quantity of almost 2000 has been reached. 
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Fig. 2--Log-linear unit curve (80-percent slope) 


preparation of tables giving the relationship between cumulative total, 
cumulative average, and unit cost for a range of slopes and quantities. 
Table 1 gives values for these relationships for a 70-percent curve 
when a, the cost of the first unit produced, is equal to 1. To illus- 
trate how such a table is used, assume a log-linear unit curve and a 
quantity m of 20 units. The total cost of 20 units is approximately 
7.4, the cumulative average cost of 20 units is .37, and the cost of 
the 20th unit is .214, in terms of the cost of the first unit. The 


unit cost of .214 appears in the dual-headed column, Yip Yor since a 
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Fig. 3--Log-linear cumulative average curve (80-percent slope) 


log-linear unit curve is assumed. If a log-linear cumulative average 
cost curve is assumed, this column presents the cumulative average 
cost. One column serves to present both log-linear unit and log-linear 
b 


cumulative average since the functional form of the equation, y = ax , 


is the same in either case. 

In practice, the unit cost is most frequently considered to be 
linear, but there are sufficient exceptions to suggest that the choice 
must be based on past experience. Once the choice is made, however, 


it is of the utmost importance to apply the technique consistently. 


OMND ULPWNHE FB 
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Table 1 
70-PERCENT CURVE DATA 
Log-linear Unit Log-linear Cumulative Average 
Cumu- 

Cumulative Cumulative lative Cumulative 

Total Average Unit Average Unit Total 

* Yo Yy Yo 4y Y¥” 
1.000000 1.000000 “1.000000 1.000000 1.000000 
1.700000 0.850000 0.700000 0.400000 1.400000 
2.268180 0.756060 0.568180 0.304541 1.704541 
2.758180 0.689545 0.490000 0.255459 1.960000 
3.195027 0.639005 0.436846 0.224232 2.184232 
3.592753 0.598792 0.397726 0.202125 2.386357 
3.960150 0.565736 0.367397 0.185419 2.500777 
4.303150 0.537894 0.343000 0.172223 2.744000 
4.625979 0.513998 0.322829 0.161460 2.905460 
43931771 0.493177 0.305792 0.152465 3.057925 
5.222928 0.474812 0.291157 0.144802 3.202727 
5.501336 0.458445 0.278408 0.138173 3.340900 
5.768511 0.443732 0.267174 0.132365 3.473266 
6.025688 0.430406 0.257178 0.127222 3.600487 
6.273896 0.418260 0.248208 0.122626 3.72503 
6.513996 0.407125 0.240100 0.118487 3.841600 
6.746721 0.396866 0.232726 0.114734 3.956334 
6.972702 0.387372 0.225980 OTS 10 4.067644 
7.192481 0.378552 0.219780 0.108171 4.175816 
7.406536 0.370327 0.214055 0.105279 4.281095 
7.615284 0.362633 0.208748 0.102604 4.383699 
7.819094 0.355413 0.203810 0.100119 4.483818 
8.018295 0.348622 0.199201 0.097804 4.581622 
8.213180 0.342216 0.194886 0.095639 4.677261 
8.404015 0.336161 0.190835 0.093609 4.770870 
8.591037 0.330425 05187022 0.091702 4.862572 
8.774462 0.324980 0.183425 0.089904 4.952476 
8.954487 0.319803 0.180024 0.088206 5.040682 
9.131290 0.314872 0.176803 0.086600 5.127282 
9.305035 0.310168 0.173745 0.085076 5.212359 
9.475873 0.305673 0.170838 0.083629 5.295988 
9.643943 0.301373 0.168070 0.082252 5.378240 
9.809373 0.297254 0.165430 0.080940 5.459180 
9.972281 0.293302 0.162908 0.079687 5.538867 
10.132777 0.289508 0.160496 0.078490 5 617357 
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As is evident from Table 1, large errors could result if one type of 


curve was confused with the other. 


Nonlinear Hypothesis 


Throughout this section it will be assumed that the log-linear 
hypothesis applies, i.e., that the learning curve is linear when plot- 
ted on logarithmic grids. It must be mentioned, however, that this is 
not the only possible formulation of the learning curve. A number of 
studies have suggested that the curve is not log-linear. One of the 
best known of these is the Stanford Research Institute investigation 


of 20 World War II aircraft. The study proposed 


hat 
votes 

as a more reliable expression of the relationship between man-hour 

cost and cumulative output. The decision to find a substitute function 
was apparently prompted by a visual inspection of several series that 
seemed to indicate a concavity when viewed from below in the unit learn- 
ing caeoe.” This concavity has been recognized independently in other 
studies. 

However, in some cases both the labor and production cost curves 
develop convexities beyond certain values of cumulative output. In the 
theory of a linear unit curve, it is implicitly assumed that constituent 
curves (fabrication, subassembly, and major and final assembly) are par- 
allel to the linear unit curve, implying that the rate of learning on 


all production jobs in all departments is the same. However, it is to 


* 

In this context, concavity means that when plotted on logarithmic 
grids the curve declines at an increasingly steep slope as it moves 
away from the y-axis. In the formulation 


a 


ae 


the curve becomes essentially linear as x becomes large relative to B. 
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be expected that the departmental learning curves could have different 
slopes from each other (e.g., fabrication, 80 percent; subassembly, 75 
percent; and major and final assembly, 70 percent). The sum of these 
curves (the unit curve) would be convex when viewed from below and ap- 
proach as a limit the flattest of the departmental curves. 

Much literature is available describing the bases for, and hypoth- 
eses about, learning curves, and it is beyond the scope of this section 
to attempt to cover this background material in any it. For this 
discussion, it is stipulated that the learning curve is a useful and 
accepted estimating tool, particularly in the aerospace industry, that 
the log-linear curve is the one most commonly used, and that a knowl- 
edge of its mechanics is indispensable to persons making or using cost 


estimates. 


Plotting a Curve 


In the graphical display of learning curves, the problem is to 
represent the average cost for a lot or a complete contract, since typi- 
cally, man-hours or costs are not recorded by unit. See, for example, 


the following table: 


Manufacturing 

Lot Units Hours per Lot 
a} 1-10 5,830 
2°" -$epe 4,370 
3 21-50 10,550 
4 51-100 14,750 


"There is one subject that is not discussed in the literature: 
the effect of production rate on unit cost. Economic theory generally 
holds that this relationship can be described by a U-shaped function: 
First, cost declines as production rate increases; next, it is insensi- 
tive to rate over some range; and eventually, it begins to rise again. 
In learning-curve applications, on the other hand, it is assumed implic- 
itly that cost is not affected by rate of output (or that the rate is 
constant). Empirical evidence of the interaction between the volume 
and rate effects is scanty. For further discussion, see Lee E. Preston 
and E. C. Keachie, "Cost Functions and Progress Functions: An Integra- 
tion," American Economic Review, Vol. 54, No. 2, Part I, March 1964, 
pp. 100-107. 
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To plot a cumulative average curve from these data, the cumulative 


average hours are computed at the final unit in each lot: 


Manufacturing Cumulative 
Plot Point Hours per Lot Computatton Average Hours 
10 5,830 > 050 = LO 583 
20 4,370 10,200 = 20 510 
50 10 ,550 20,750 + 50 415 
100 14,750 35,500 + 100 se 


The cumulative average at the 10th unit is 583 hours; this is the first 
plot point. Successive plot points are at the end of each lot, since 
these are the points where the cumulative average hour figures apply. 

To plot the unit curve it is first necessary to compute the unit 
hours and then to establish plot points. The unit hours can be taken 
as an average for each lot: 


Untt 
Lot Computatton Hours 


il 5,830 = 10 583 

Zz 4,370 + 10 437 

3 10,550) = 30 352 

4 14,750 + 50 295 
The lots can be represented by these unit hour values. The question 
is, where should the values be plotted? To plot at the lot arithmetic 
midpoint is to assume that the learning curve can be approximated by a 
linear curve on arithmetic grids, but as suggested by Fig. 1 such a 
method of approximation only becomes reasonable for lots following a 
large number of previous units. Thus, when dealing with a log-linear 
function, the arithmetic midpoint plot produces the unequal distribu- 
tion of the area under the curve, as shown in Fig. 4. 

The true midpoint is defined as that unit, x which represents 
the entire lot and which must also reflect the average unit cost, Yn? 
of the lot. The total cost (or total hours) of the lot is equal to 
the product of ‘ and the number of units in the lot, m. This product 


x 
will approximate the area under the curve for m units (see Fig. 5). 


x 
If n represents only integers, the limits of the area must be 
modified. (See H. Asher, Cost-quantity Relationships in the Airframe 
Industry, The Rand Corporation, R-291, July 1, 1956, pp. 34-38.) 
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Fig. 4--Learning curve on arithmette grids 
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Fig. 5--True lot midpoint on arithmetic grids 


Note that if the area under the curve is equal to Ys the two cross- 
hatched areas in the'figure must be equal. In fact, the exact deter- 
mination of a true lot plot point for plotting purposes depends on (1) 
the lot quantity; (2) the type of curve hypothesized, i.e., whether 


the unit curve or the cumulative average curve is log-linear; and (3) 
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the true value of the slope. Therefore, these yalues must be known or 
asSumed. The first, the lot quantity, will be known. The second, the 
nature of the curve, must be assumed. The third, the true value of the 
slope, is actually never known, and is usually approximated based on 
prior experience. 

It is possible to ascertain the exact lot plot points for each type 
of curve over a range of slopes and quantities. However, because of 
the assumptions mentioned above that will usually have to be made re- 
garding both the type of curve and its approximate slope, in most sit- 
uations there is little need to strive for extreme accuracy. The fol- 
lowing discussion provides methods of approximation that do not involve 
the complicated calculations required to derive the true lot plot point. 

As illustrated in Fig. 5, Yu, is the average cost for the lot as 
well as the unit cost of the lot plot point Zs Therefore, tables sim- 
ilar to Table 1 can be used to derive acceptably accurate plot points. 
To illustrate, assume a log-linear untt curve of 70 percent, a first 
lot of 10 units, and a first unit cost of 1. Then, the cumulative av- 
erage cost y, of the first 10 units is .493. This average cost lies 
between unit cost values yy of .568 and .490, i.e., between units 3 and 
4 on the unit curve. Arithmetic interpolation yields a value for Z,, 
of slightly less than 4, which is the plot point for this particular 
lot when a 70-percent log-linear unit curve is assumed. An exact solu- 
tion to the plot point equation would show the true plot point for a 
70-percent curve to be 3.95. Similarly, if the first unit cost is 1 
and if a 70-percent log-linear cwmlative average curve is assumed, 
data from Table 1 yield a plot-point approximation of slightly less 
than 3 (the cumulative average cost for 10 units is .306, which lies 
between unit cost values of .400 and .304, i.e., between units 2 and 3 
on the unit curve); the true plot point is 2.98. In this example, the 
plot points vary because of the assumption that one or the other of the 
curves is log-linear. This method of approximation produces accurate 
first-lot plot points for all but very small lot sizes. As a general 
rule, the steeper the slope and the smaller the lot size, the less 
accurate this approximation method becomes. 


For the successive lots following a preceding quantity, the same 
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procedure can be used for approximating plot points. To illustrate, 
again using Table 1, assume that a quantity of 10 units follows the 
first lot of 10 units. If a 70-percent log-linear wit curve and a 
unit cost of 1 are assumed, the total cost of the second lot may be ob- 
tained by subtracting 4.93 (the total cost of the first 10 units) from 
7.4 (the total cost of 20 units), or a difference of 2.47. This repre- 
sents an average cost of .247 for the 10 items in the lot. This value 
falls between units 15 and 16 on the unit curve, and simple interpola- 
tion gives a value of 15.1 for the plot point. If a log-linear cwm- 
lative average curve is assumed, the approximation value of the plot 
point is also 15.1. In other words, from Table 1, the difference be- 
tween the cumulative total for 20 and 10 units, 4.28 and 3.06, respec- 
tively, is 1.22, or an average of .122 for the 10 units in the lot. 
This unit cost lies between .1226 and .1185 or units 15 and 16 on the 
unit curve. 

Tables to permit computation of lot plot points for a range of 
slopes and lot quantities are available in the literature.” In addi- 
tion, an easier-to-use, but less accurate, approximation method will be 
discussed that provides plot points for early lot quantities of less 
than 100. 

Figure 6 presents an approximation of the plot point for the first 
lot. It illustrates that substantial errors are possible when deriving 
first-lot plot points. The abscissa represents first-lot quantity and 
the ordinate the first-lot plot points associated with each quantity. 
For the upper dashed curve, a 95-percent log-linear unit curve is as- 
sumed; for the upper solid line, a 95-percent log-linear cumulative 
average curve is assumed. Similarly, for the lower lines, 65-percent 
curves are assumed. Approximation methods suitable for one type of 
curve cannot be used for another type unless extremely large quantities 
are dealt with, i.e., well beyond those shown in the figure. Figure 6 
also shows the greater sensitivity to slope exhibited by the log-linear 


cumulative average curve for moderately small first lots. 


* * 
See, for example, H. E. Boren and H. G. Campbell, Learning Curve 
Tables, Vols. 1-3, The Rand Corporation, RM-6191-PR, to be issued. 
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Fig. 6--First-lot midpoints versus first-lot quantities 


In addition, it affords an opportunity to approximate quickly the range 
of error that can be introduced by inappropriate plotting of the cost 
of the first lot. 

Figure 7 gives plot points for follow-on lots. These points rep- 
resent an average of the range obtained from 65- to 95-percent curves 
and the range obtained from a log-linear unit or a log-linear cumulative 


average curve. The graph is used as follows: 


1. The first unit of the contract lot is found on the 45-deg 
line. 

2. The curve extending out from this unit is followed to the 
point on the horizontal axis that represents the last unit of 


the lot. 
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Fig. 7--Plot potnts for average costs 
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3. The plot point is read off the vertical axis at that point. 
Thus, for a lot of 10 units following 10 previous units, the 


plot point would be slightly over 15. 


In practice, plot points for only the first two or three lots, if these 
comprise more than about 25 units, need be taken from the graph. For 
succeeding lots, the arithmetic lot midpoint is usually adequate. 

As a further illustration, Fig. 8 shows two sets of curves. The 
lower set of curves was constructed from a series of small contract 


lots, 10, 29, and 31 units. The upper set of curves was based on two 
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Fig. 8--Unit curves from contract lot averages 
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large contract lots, 100 and 500 units. With lot average costs, the 
costs were plotted (1) at lot quantity arithmetic midpoints, (2) at 
plot points where a log-linear unit curve for 65- and 95-percent slopes 
was assumed, and (3) at plot points where a log-linear cumulative aver- 
age curve for 65- and 95-percent slopes was assumed. 

From Fig. 8 it can be seen that the distance between the unit curve 
constructed with the arithmetic midpoint and the unit curve constructed 
with the true plot points depends on the size of the lot quantity. The 
larger the lot quantity, the greater the distance between the midpoint 
line and the other lines. In both sets the unit curves exhibit the 
widest variation for the first lot. However, for a series of small con- 
tract lots the range of plot points is of interest only for the first 
few lots. The midpoint of even the second-lot quantity may often pro- 
vide a good approximation of the unit curve. 

It is not the purpose of this discussion to recommend any partic- 
ular technique. Rather, it is to underline that plotting representative 
unit costs for contract lots is of importance. The gross misplacement 
of early points could lead to improper conclusions about cost-quantity 


relationships. 


Variations 


The examples used earlier tend to suggest that data points gener- 
ally fall along a straight line, as one would expect from the log-linear 
hypothesis. The truth is that plots of the type illustrated in Fig. 9 
are not unusual and that fitting a curve to these points is more than 
a matter of understanding the least-squares method of curve fitting. 
The types of plots in Fig. 9 are common enough to have been given names 
by the airframe industry. The "scallop" is generally caused by a model 
change or some other major interruption in the production process. 
Characteristic of a scallop is the abrupt rise in manufacturing hours, 
followed by a rapid decline, the basic slope of the curve remaining 
relatively unchanged. When a model change is sufficiently great, as in 


the case of the change to the F-106B from the F-106A, the result is not 


Unit hours 
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Fig. 9--Illustrative examples of learning-curve slopes 


a scallop but a change to a new curve. In this case, a "level-off" or 
"follow-on" is characteristic of the initial portion of the new curve. 
This is attributed to learning from a previous model that carries over 
and flattens the curve during initial production. Such an effect can 
also occur when production is halted for a long period or when produc- 
tion is transferred to a new facility. 

To "bottom-out" is the tendency for a learning curve to flatten 


at high production quantities. It seems reasonable that at some point 
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no further learning should occur or that whatever slight learning does 
occur would be offset by the effect of other factors. In addition, it 
can be established empirically that bottoming-out has occurred in a 
number of cases. There are those who argue, however, that learning 

can continue indefinitely, or at least as long as the attempt is made 
to obtain man-hour reductions. The classic case relates to the assem- 
bly of candy boxes, in which operation the learning curve was found to 
have continued for the preceding 16 years when 16 million boxes were 
assembled by one persons The problem for the estimator, of course, is 
that while bottoming-out may occur in any given case, it is difficult 
to predict where it will occur. One study found that for the sample of 
airframes examined it was fairly typical for flattening to begin at the 
300th wate” but in the past this has not been true for many airframes. 
The B-17 curve maintained a 70-percent slope out to the 6000th unit and 
then exhibited a toe-up. 

"Toe-ups" and "toe-downs" are the names given to the rather sharp 
rises or falls in hours that sometimes occur at the end of a production 
series. The upward trend has been explained as resulting from the 
transfer of experienced workers to other production lines, an increase 
in the amount of handwork as machines are disassembled, failure to re- 
place or repair worn tooling at the normal rate, tool disassembly, or 
a production lag at the end of a program to forestall mia 
Toe-downs are thought to be caused by fewer engineering changes at the 
end of a production run and also by the ability of the manufacturer to 
salvage certain items fabricated in previous lots. 

It is important to realize that such variations in production do 
occur, and not occasionally but frequently. In the analysis of man- 


hour or cost data, use of the unit curve reveals these variations, and 


* 
Glen E. Ghormley, "The Learning Curve," Western Industry (now 
Western Manufacturing), September 1952, pp. 31-34. 


we 
Planning Research Corporation, Methods of Estimating Fixed-wing 


Airframe Costs, Vol. I (Revised), PRC R-547A, Los Angeles, April 1967. 
RRR 

Glenn M. Brewer, The Learning Curve in the Airframe Industry, 

School of Systems & Logistics, Air Force Institute of Technology, Re- 

port SLSR-18-65, August 1965. 
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for this reason the unit curve is generally preferred. The cumulative 
average curve tends to smooth‘out aberrations to such an extent that 
even major changes can be obscured, as shown in Fig. 10. The data 
points are taken from a fighter aircraft production program that had 
more than its share of problems. The solid line shows how a cumulative 
average curve dampens the effect of these problems. The choice between 
working with the unit or the cumulative average curve depends on the 
problem. The unit curve better describes the data and is therefore 
preferred. In addition, its use can aid the cost analyst in determining 
whether the basic curve is best represented by a log-linear cumulative 
average or unit function, what slope is most appropriate, and what fol- 


low-on projections can be made. The log-linear cumulative average 
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Fig. 10--Smoothing effect of cwmlative qerage curve 
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curve is widely preferred in predictive models because of its computa- 
tional simplicity, i.e., the cost of nm items is simply the cumulative 
average cost of the mth item times nm. However, it is important to 


understand all curves well enough to choose intelligently between them. 


Applications 


The learning curve is used for a variety of purposes and in a var- 
iety of contexts; how the curve is drawn will depend on the purpose 
and the context. In long-range planning studies, for example, the curve 
must be constructed on the basis of generalized historical data, and the 
possible error is considerable. Empirical evidence does not support 
the concept of a single slope for all fighter aircraft, all solid pro- 
pellant missiles, or all spacecraft. Therefore, the practice of assum- 
ing that manufacturing hours on the airframe will follow an 80-percent 
curve (as was common for many years) or that electronic equipment will 
follow, say, a 90-percent curve, can lead to very large estimating 
errors. 

In regard to airframes, Table 2 shows the slope of the manufactur- 
ing-hour curves for 25 post World War II Air Force and Navy aircraft 
and indicates that a slope steeper than 80 percent is the rule. Since 
the learning-curve slopes of the table show important differences, it 
would be desirable to relate slope to aircraft characteristics. Such 
a relation is accomplished by a technique suggested by the Planning 
Research Corporation.” Separate estimating equations based on aircraft 
characteristics are derived for four different production quantities-- 
10, 30, 100, and 300--and a learning curve is developed from the esti- 
mates at these four points. However, on a theoretical level the con- 
cern is with aircraft characteristics that influence the rate of learn- 
ing. It seems reasonable to expect relatively little learning for a 
model that represents a small modification over a preceding type, be- 


cause the previous model would have already absorbed a considerable 


x 
Fixed-wing Airframe Costs. 
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Table 2 


LEARNING CURVES FOR MANUFACTURING 
(Labor for Airframe Only) 


Learning Curve 


Aircraft (%) 
FPSHEESL <i cysts ae seat taetele enemies oc: 77 
PR OICET cis scare aio olotake eleteteheheteiets steve ie 
BOE TERE Woo vaire saisiva caaaekalel oo folie raven ale oe 74 
PiIGhGer ee. s eae iw Mews shale oes 73 
MO RoC aia iaielotel sprouse srvnaiataliat oneuej ele) svete 78 
ROUT 1orroi'e..0 hoastar ite wiraniay alle maselie Se Leste, Sics/( 71 
ACL stalin evete ereratevere atelelete ciel sak 74 
abet Fawn ot ans'o oo: oieinne a7 aaa in hiaeeys cent 76 
AEC Bia ravate ia iocar act Stars a tenereVaiellWesl salle 77 
PSHE GIs es ators telae a cise at eee ae 79 
RPS oretavaus tec atta or a tie jevereroigs ore el ate 82 
BUR TNGET: 5 airanaytiae ual ety si overnite Ginpererenes 76 
PED NGG Wises tal eis elm cian ol ale) ate ielctetetelsy<te 75 
Rely eer cers area ves over ovetene suarerereleteners eos fe 74 
16/1 02) A eR OE MERC RCRRP ECAC ECR ROLL RCRA Rrra 76 
BOMB OF oi ois posisiw a oiaien ave cis teaciemenchareaes 73 
BOMUM CL oo ie lcicw stele o's sial katbie Ai ave 30's 70 
BOMB srorsscsisiecape ici aye, eisiviakors «lave eies pa 
BOMB EY oa, «scotia sale alokare,@ share cue Sie. susiays 79 
CAGE a, cores cirateiges ataisn ote faite ys (one sl 6 ete 74 
Ga Oletarcfetesstsienny oioieteierateleval i cenerel crete 78 
GARZ Ole ele cietal cls sie sie uiel seine wlerele es ietecs 77 
CASO wkclarsusleio.s aponcisun) siske 3 sy sdietolmiaye ce 15 
SEs TVG Tatas aaj eye, Sacaxes ovetia, a6 cars iane ar siatiohio 74 
TERRE WIS Easeibs 6,60 Cave bocce Nelie: ple le) 6). awe e'etane a 

MGT aas sire (oto e ca aio 0) aka haravelial aecato oss uaieLalwusl Lo 

Standard *Devidlakionetd 2i0...wseee ees 227 


SOURCE: G. S. Levenson and S. M. Barro, 
Cost-estimating Relationshtps for Aircraft 
Atrframes, The Rand Corporation, RM-4845-PR 
(Abridged) , May 1966, p. 56. 
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learning effect. On the other hand, if an aircraft contains radically 
new design features, a high initial cost is to be expected, followed 

by a rapid decline with increased production quantities. In other 
words, it has been suggested that the "newness" of an aircraft should 
be a major determinant of learning-curve slope, but explicit techniques 
for taking newness into account have yet to be developed. 

For estimating to be effective, therefore, learning curve. must be 
established on the basis of historical data relevant to the specific 
problem. Such curves are equally applicable to missiles, electronic 
equipment, aircraft, ships, and other types of equipment, but the slopes 
may be different for each of these. A recent study of avionics, for 
example, showed slopes ranging from 84 to 91 percent with a median 
value of 88 percent. If a comparison is being made between two weapon 
systems, one involving aircraft and the other missiles, the learning- 
curve slope chosen for each could play a significant part in the total 
system cost comparison. For example, the effect of using a 92-percent 
rather than a 90-percent cumulative average curve is an increase of 25 
percent in the total cost of 1500 items. As one would guess, the sit- 
uation is worse when steeper slopes are involved. If a slope of 62 per- 
cent instead of 60 percént is assumed, there is a 42-percent difference 
in the cost of 1500 items and a 25-percent difference in the cost of 100 
iiee.” In practice, errors of this type can be minimized by origina- 
ting the curve at the estimated cost of the 100th unit rather than at 
the first. Table 3 shows how this reduces the effect of a 2-percent 
change in slope on total cost. 

Once a few data points are available either for developmental or 
production items, the situation should improve, but, as illustrated by 
Fig. 11, the first few points may be misleading. Suppose an estimator 
had been asked to calculate the cost of a large production contract 
after the fabrication of the first 30 units. By fitting a curve to the 
existing data he would have projected a learning curve with an 88- or 


89-percent slope and at a level considerably higher than that later 


* 
The assumption regarding the type of curve is important. For 


example, if a log-linear unit curve (rather than a log-linear cumula- 
tive average curve) were assumed, these differences would be only 25 
and 13 percent, respectively. 
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Table 3 
EFFECT OF VARYING SLOPE ASSUMPTIONS 


Change tn Total 
Cost of 1500 
Change tin Slope Units (%) 


From 90% to 92% 
Origin of curve: 
UEHE GIS Krereveteletete onsteiers Soc'e-« 25 


From 60% to 62% 
Origin of curve: 
rarity elo raxsceravereistetevetslotsieiete)o%s 42 


"If a log-linear unit curve is assumed, 
this value would be less than 6 percent. 


experienced, In this situation it is important to realize that such a 
flat learning curve for airframe production is improbable. The estima- 


tor should have an idea of what the answer is likely to be and should 


investigate differences. 
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Fig. 11--Direct labor hours for a transport aircraft 
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With a small sample of data, where a learning curve is fitted to 
a few points, the correlation may be perfect, i.e., all the points may 
lie on the fitted line, but the results can still be unreliable. The 
points used in fitting must be sufficiently numerous and reasonably 
homogeneous with the points implied by extending the curve to offer a 
reasonable probability of success in predicting costs. 

The manufacturing history of the item to be fabricated is the most 
valuable information the estimator can have. Variations from the norm 
may be caused by particular problems, configuration changes, or changes 
in manufacturing methods. In the curve of Fig. 11, the initially flat 
portion (out to the 30th airframe) is explained by the manufacturer as 
being typical of the initial production period. In this manufacturer's 


experience, the curve begins to steepen when 


1. Manpower has stabilized or reached its peak. 
2. The engineering configuration has stabilized. 


3. The parts flow has stabilized. 


Thus, it may be preferable to explain certain points and exclude them 
rather than to include them and bias the curve in height or ie 

Whether to include all the points depends, in addition, on the 
anticipated use of the resulting curve. If a unit cost curve that in- 
cludes all costs and changes is desired, a line of best fit through the 
unit plot points may be appropriate. If the curve is to be used in 
negotiating a follow-on contract, the effect of changes should be elim- 
inated by constructing a curve through the lower portion of the plotted 
individual unit points, as in Fig. 12. In effect, this assumes that 
the introduction of changes raises the hours initially but that these 
decrease again to the approximate level of the original curve. 

Whatever the basic technique, it is important to remember that on 
logarithmic grids the points at the right are usually more important 
than those at the left. In visually fitting a line, the analyst should 
avoid the tendency to be unduly influenced by plot points for small 


early lots. Early units are often incomplete because they are used for 


* 
It is also possible to have a segmented unit curve, as implied 
by Fig. 11, and several manufacturers subscribe to this concept. 
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Unit hours 








Cumulative units 


Fig. 12--Effect of changee on the learning curve 


test purposes. It is equally possible that early units will include 
certain nonrecurring problems incident to startup and for this reason 
may be above the level suggested by later plot points. 

Of course, variations in unit cost (or hour) data may happen for 
reasons other than the introduction of changes. An interruption in 
production can be an important factor. Interruptions may occur because 
of production cutbacks, labor disturbances, or funding problems. What- 
ever the reason, if significant time periods are involved, the learn- 
ing curve will be affected in much the same way as illustrated in Fig. 
12. Those units produced after a significant amount of interruption 
can be expected to exhibit sharp increases in costs, followed by a re- 
covery to the approximate projected level of the earlier preinterrup- 


tion period. 
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